I assume they're doing "Structured Generation" or "Guided generation", which has been possible for a while if you control the LLM itself e.g. running an OSS model, e.g. [0][1]. It's cool to see a major API provider offer it, though.
The basic idea is: at each auto-regressive step (each token generation), instead of letting the model generate a probability distribution over "all tokens in the entire vocab it's ever seen" (the default), only allow the model to generate a probability distribution over "this specific set of tokens I provide". And that set can change from one sampling set to the next, according to a given grammar. E.g. if you're using a JSON grammar, and you've just generated a `{`, you can provide the model a choice of only which tokens are valid JSON immediately after a `{`, etc.
n=1 here, though I've heard others say the same -- but I (fairly healthy 30s male, vaccinated) found Paxlovid massively reduced symptom intensity for me. Within a day my symptoms went from "top 5 fevers I've ever experienced, normal function significantly impaired" to "feels like a cold; can reasonably handle myself around the house and even take a software engineering interview".
I most likely would not have got a severe infection and probably would not have got Long Covid, given my age / health / vaccine status, even if I hadn't taken it; but nonetheless I'm glad I was able to get it. Definitely worth it for the weird taste (hard candy helps).
How do you know that it was Paxlovid that made you better, rather than coincidentally convalescing? There's a tendency for fevers to break as the body gets the upper hand on an infection, so if you started taking Paxlovid during a high fever your body might have been nearly done mopping up.
I'm not saying Paxlovid didn't help you, just that it's tricky to distinguish from placebo without a study.
It worked extremely fast for me, I took it within 16 hours of symptoms that were increasing rapidly. Within 6-8 hours I started feeling much better, instead just dealing with the Paxlovid side effects. I don’t think my body fought off Covid that fast. Anecdotal of course, but I’d take it again in a heartbeat. I did get a rebound infection the next week but I couldn’t feel it, just tested positive for a couple days.
> I took it within 16 hours of symptoms that were increasing rapidly. Within 6-8 hours I started feeling much better
So about 24 hours? I took nothing when I got COVID, and the major fever and body/head aches only lasted about that long. One day I started feeling absolutely awful, and I woke up the next day feeling substantially better but unable to smell anything but smoke for the next week.
It is possible that the paxlovid helped you, but given the few details you've shared so far it's also possible that it didn't do much that wasn't already going to happen.
It was dramatic and started just a few hours after the first dose. I was worsening all morning and by mid afternoon a reversal, coinciding with the Paxlovid side effects. It wasn’t an overnight thing where rest was a factor, I was awake and bedridden. So I’m pretty convinced, enough to drop $1,500 retail if need be in the future for it.
Interestingly I had the weird side effect of going from COVID fever to low body temperature by my 2nd-3rd day of Paxlovid. If I remember correctly, it was in 95F-96F range.
My doc advised to stop taking it, but after reading on Reddit that a few others had had similar experience, decided to finish the entire treatment.
Same here. I got really sick the second time I had Covid (despite it being a more “mild” strain - Omicron I believe). Was bedridden for almost two weeks and had a rebound fever. With Paxlovid however, most of my symptoms subsided after a few days of taking it and I didn’t have a fever. This was the third time I got Covid.
Obviously I don’t know for sure how much I can attribute to the medication, but I will be taking it again if I catch Covid.
Both times I had COVID I went from "worst fever ever / my body feels like I've been hit my a bus" to "I have a pretty average cold" in less than 36 hours. Utterly bizarre. (Vaccinated, good health, 40s male)
Edit to clarify: I didn't take anything other than paracetamol and ibuprofen.
I did not take anything special either, vaccinated, and have had more or less the same experience. I got from not being able to even sleep the night due to the pain, to feeling great the very following afternoon. Completely bizarre. None of the people I know experienced it like this, though.
This is why properly controlled trials are needed for stuff like that. It is easy to attribute the change to whatever random thing I tried at that point out of desperation.
Trying to make the same transition right now. I’ve got almost 10 years experience with python and data engineering and I’ve been reading tutorials and playing with projects on the side.
I think I’ve got a grasp of the fundamentals and the ability to learn fast on the job, but every MLE job listing I see wants “4+ years of experience training and deploying models in a production environment” or something (even non-Senior roles!). I’m not sure how to break into it, to acquire a MLE job to get the experience to acquire an MLE job. Does anyone have any advice?
Just the standard advice: It's usually much easier to switch into a new role (MLE, manager,...) at your existing company than landing those roles directly at a new company. So if your current employer does not employ any MLEs, consider joining another company that does– but apply for a role you're currently well qualified for, and then try to make the switch internally. Consider being up-front about that in the hiring process to get signal on how supportive a company is.
a) a linear SSM (a form of RNN?) is equivalent to Attention without the scaling and softmax; and
b) Attention is "all you need" and the thing that made Transformers radically outperform all the previous architectures like LSTMs that used to dominate NLP;
does that imply c) the scaling and softmax parts of the attention equation, in particular, is the magic touch that makes Transformers work so well?
The major difference is that transformer state grows as the sequence gets longer, while recurrent models use a fixed size state. So presumably at sequence length (T) > size of state space (N), the transformer will be better on some very specific tasks. Not all, especially those that require the model to select information from the beginning of the sequence conditional on something at the end of the sequence. Transformers can refocus any time, while SSNs need to guess right from the start what to keep and what to drop. SSNs could use the old trick of repeating the input twice to allow the end to condition on the beginning as well.
An important role is held by the softmax function which normalizes the attention scores, allowing the model to weigh different parts of the input sequence dynamically. This means that, unlike RNNs which sequentially process inputs and update states, Transformers can directly access and prioritize information from any part of the sequence, and they are not slower for T < N.
Doesn’t this effect happen with EVs generally, not just Teslas? (I don’t particularly mean to defend Tesla here, but I wonder if this might be misleading)
When I see an embedded DSL passed around as strings like this I can't help but think "this could be its own programming language"
Then it could have syntax highlighting, auto complete, and so on. The type system for such a language could possibly include verifying shapes at compile time.
What would a language comprised of .ein source files for manipulating tensors, which compiles down to the same low level ops, look like?
No need for .ein source files. We just need a programming language that allows the definition of embedded DSLs without shoving them into one-line strings. A language like Common Lisp.
This is also how it works in Julia, where macros digest notation for einsum-like operations before compile-time. In fact the linked file's explanatory comment:
(einsum (A i j) (B i k) (C k j))
results in the the updates
A[i,j] = \\sum_k B[i,k]C[k,j],
which is equivalent to matrix multiplication.
very nearly contains the syntax used by all the Julia packages (where @ marks a macro), which is
I wrote a library in C++ (I know, probably a non-starter for most reading this) that I think does most of what you want, as well as some other requests in this thread (generalized to more than just multiply-add): https://github.com/dsharlet/array?tab=readme-ov-file#einstei....
A matrix multiply written with this looks like this:
enum { i = 2, j = 0, k = 1 };
auto C = make_ein_sum<float, i, j>(ein<i, k>(A) * ein<k, j>(B));
Where A and B are 2D arrays. This is strongly typed all the way through, so you get a lot of feedback at compile time, and C is 2D array object at compile time. It is possible to make C++ template errors reasonable with enable_if and the like, this works well-ish on clang, but not so well in GCC (YMMV).
This library is a lot less automated than most other einsum implementations. You have to explicitly control the loop ordering (in the example above, the `j` loop is innermost because it is loop 0). If you build a good loop order for your shapes, the compiler will probably autovectorize your inner loop, and you'll get pretty good performance. Control over the loop ordering is in general a useful tool, but it's probably a lot lower level than most users want.
Einsums are the regexes of tensor programming. Should be avoided at all costs IMO. Ideally we should be able to write native loops that get auto-vectorized into einsums for which there is a CUDA/PTX emitting factory. But for some reason neither PyTorch nor JAX/TF took this route and now we are here.
Some of the einsum expressions I have seen for grouped multi headed/query attention is mind-boggling and they get shipped to prod.
JAX kind of did take this route, no? The main issue is that it’s going to be hard/impossible to compile Python loops to GPU kernels. It’s also maybe not the most ergonomic solution, which is why there is shorthand like einsum. Einsum can be much more clear than a loop because what it can do is so much more limited.
JAX tries to be a functional language that has a Python front end. The problem is if you are outside Google and don't really understand the XLA compiler, then you are screwed.
They're commoditizing their complement [0][1], inasmuch as LLMs are a complement of social media and advertising (which I think they are).
They've made it harder for competitors like Google or TikTok to compete with Meta on the basis of "we have a super secret proprietary AI that no one else has that's leagues better than anything else". If everyone has access to a high quality AI (perhaps not the world's best, but competitive), then no one -- including their competitors -- has a competitive advantage from having exclusive access to high quality AI.
And the content industry will grow ever more addictive and profitable, with content curated and customized specifically for your psyche. The very industry Meta happens to be the one to benefit from its growth most among all tech giants.
> will make most humans' (economic) value evaporate for the same reason
With one hand it takes, with the other it gives - AI will be in everyone's pocket, and super-human level capable of serving our needs; the thing is, you can't copy a billion dollars, but you can copy a LLaMA.
> OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity. We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome.
No current LLM is that, and Transformers may always be too sample-expensive for that.
But if anyone does make such a thing, OpenAI won't mind… so long as the AI is "safe" (whatever that means).
OpenAI has been totally consistent with saying that safety includes assuming weights are harmful until proven safe because you cannot un-release a harmful model; Other researchers say the opposite, on the grounds that white-box research is safety research is easier and more consistent.
I lean towards the former, not because I fear LLMs specifically, but because the irreversibly and the fact we don't know how close or far we are means it's a habit we should turn into a norm before it's urgent.
In general I think this approach of "super easy capture into an append-only log" is great, especially if it can be paired with features to enable editing/re-discovery/search/synthesizing old ideas together, which exist in a separate view/mode from the "just get something down as fast as possible" mode. Working on something like this, but just in nights/weekends free time with other obligations, so it's been slow going.
Yeah, it's a balance. I love being able to help, and I am generally in favor of asking questions early, but not ones of the form "hey so I ran this code and it errored. Help?"
"... did you read the stack trace? Did you look at the code referenced by the stack trace?"
This is where I've learned responding with "Sure! What have you tried so far?" is relevant.
The basic idea is: at each auto-regressive step (each token generation), instead of letting the model generate a probability distribution over "all tokens in the entire vocab it's ever seen" (the default), only allow the model to generate a probability distribution over "this specific set of tokens I provide". And that set can change from one sampling set to the next, according to a given grammar. E.g. if you're using a JSON grammar, and you've just generated a `{`, you can provide the model a choice of only which tokens are valid JSON immediately after a `{`, etc.
[0] https://github.com/dottxt-ai/outlines [1] https://github.com/guidance-ai/guidance