I'm hardly the best person to give a point-by-point on how modern neural networks work. The original paper that kind of brought together a bunch of ideas that were floating around is called "Attention is All You Need" in 2017 (and those folks are going to win a Turing almost certainly) and built on a bunch of `seq2seq` and Neural Turing Machine stuff that was in the ether before that.
Karpathy has a a great YouTube series where he gets into the details from `numpy` on up, and George Hotz is live-coding the obliteration of PyTorch as the performance champion on the more implementation side as we speak.
Altman being kind of a dubious-seeming guy who pretty clearly doesn't regard the word "charity" the same way the dictionary does is more-or-less common knowledge, though not often mentioned by aspiring YC applicants for obvious reasons.
Mistral is a French AI company founded by former big hitters at e.g. DeepMind that brought the best of the best on 2023's public domain developments into one model in particular that shattered all expectations of both what was realistic with open-weights and what was possible without a Bond Villain posture. That model is "Mixtral", an 8-way mixture of experts model using a whole bag of tricks but key among them are:
- gated mixture of experts in attention models
- sliding window attention / context
- direct-preference optimization (probably the big one and probably the one OpenAI is struggling to keep up with, probably more institutionally than technically as probably a bunch of bigshots have a lot of skin in the InstructGPT/RLHF/PPO game)
It's common knowledge that GPT-4 and derivatives were mixture models but no one had done it blindingly well in an open way until recently.
SaaS companies doing "AI as a service" have a big wall in front of them called "60%+ of the TAM can't upload their data to random-ass cloud providers much less one run by a guy recently fired by his own board of directors", and for big chunks of finance (SOX, PCI, bunch of stuff), medical (HIPAA, others), defense (clearance, others), insurance, you get the idea: on-premise is the play for "AI stuff".
A scrappy group of hackers too numerous to enumerate but exemplified by `ggerganov` and collaborators, `TheBloke` and his backers, George Hotz and other TinyGrad contributors, and best exemplified in the "enough money to fuck with foundation models" sense by Mistral at the moment are pulling a Torvalds and making all of this free-as-in-I-can-download-and-run-it, and this gets very little airtime all things considered because roughly no one sees a low-effort path to monetizing it in the capital-E enterprise: that involves serious work and very low shady factors, which seems an awful lot like hard work to your bog-standard SaaS hustler and offers almost no mega data-mining opportunity to the somnobulent FAANG crowd. So it's kind of a fringe thing in spite of being clearly the future.
Karpathy has a a great YouTube series where he gets into the details from `numpy` on up, and George Hotz is live-coding the obliteration of PyTorch as the performance champion on the more implementation side as we speak.
Altman being kind of a dubious-seeming guy who pretty clearly doesn't regard the word "charity" the same way the dictionary does is more-or-less common knowledge, though not often mentioned by aspiring YC applicants for obvious reasons.
Mistral is a French AI company founded by former big hitters at e.g. DeepMind that brought the best of the best on 2023's public domain developments into one model in particular that shattered all expectations of both what was realistic with open-weights and what was possible without a Bond Villain posture. That model is "Mixtral", an 8-way mixture of experts model using a whole bag of tricks but key among them are:
- gated mixture of experts in attention models - sliding window attention / context - direct-preference optimization (probably the big one and probably the one OpenAI is struggling to keep up with, probably more institutionally than technically as probably a bunch of bigshots have a lot of skin in the InstructGPT/RLHF/PPO game)
It's common knowledge that GPT-4 and derivatives were mixture models but no one had done it blindingly well in an open way until recently.
SaaS companies doing "AI as a service" have a big wall in front of them called "60%+ of the TAM can't upload their data to random-ass cloud providers much less one run by a guy recently fired by his own board of directors", and for big chunks of finance (SOX, PCI, bunch of stuff), medical (HIPAA, others), defense (clearance, others), insurance, you get the idea: on-premise is the play for "AI stuff".
A scrappy group of hackers too numerous to enumerate but exemplified by `ggerganov` and collaborators, `TheBloke` and his backers, George Hotz and other TinyGrad contributors, and best exemplified in the "enough money to fuck with foundation models" sense by Mistral at the moment are pulling a Torvalds and making all of this free-as-in-I-can-download-and-run-it, and this gets very little airtime all things considered because roughly no one sees a low-effort path to monetizing it in the capital-E enterprise: that involves serious work and very low shady factors, which seems an awful lot like hard work to your bog-standard SaaS hustler and offers almost no mega data-mining opportunity to the somnobulent FAANG crowd. So it's kind of a fringe thing in spite of being clearly the future.