> but then again if you'd showed me an RPi5 back in 1977 I would have said "nah, impossible" so who knows?
I was reading lots of scifi in 1977, so I may have tried to talk to the pi like Scotty trying to talk to the mouse in Star Trek IV. And since you can run an LLM and text to speech on an RPi5, it might have answered.
Obviously not even Avon figured out that the main box of Orac was a distraction, a fancy base station to hold the power supply, WiFi antenna, GPS receiver, and some Christmas tree lights, and all of the computational power was really in the activation key.
The amusing thing is that that is not the only 1970s SciFi telly prop that could become almost real today. It shouldn't be hard -- all of the components exist -- to make an actual Space 1999 commlock; not just a good impression of one, but a functioning one that could do teleconferencing over a LAN, IR control for doors and tellies and stuff, and remote computer access.
Like James Bond's Aston Martin with a satnav/tracking device in 1964's Goldfinger. Kids would know what that was but they might not understand why Bond had to continually shift some sort of stick to change the car's gear.
you can get a driver license with an automatic. But it just means you can only drive automatics.
It would have been a huge deal not being able to drive manuals 20y ago but hybrid and ev all being automatic it is not that much of a downside nowadays unless you want to buy old cars or borrow friend's car. Most renting fleets have autos available nowadays.
At this point, it is a historical artefact that will cease to exist soon enough.
Electric vehicles do not have gearboxes as there are no converters, so there is nothing to shift up or down. A few performance EV's that have been announced (and maybe even have released) with a gear stick, do so for nostalgic reasons and the gear shift + the accompanying experience is simulated entirely in the software.
It’s impossible to explain to kids now why it was funny on Seinfeld when Kramer pretended to be MoviePhone and says “why don’t you just tell me the name of the movie you selected!”
> Someday real soon, kids being shown episodes of 'Knight Rider' by their grandparents won't understand why a talking car was so futuristic.
Maybe in 100 years. The talking car was more intelligent than Siri, Alexa or Hey Google.
It is not that we are not able to "talk" to computers, it is that we "talk" with computers only so that they can collect more data about us. Their "intelligence" is limited to simple text underestanding.
Had the same thing happen when standalone GPS units were introduced to the mass market, I got one - and one of my tech friends suggested replacing the stock voice with a sarcastic/whiny one (maybe ... C3PO)...
My response was: "you obviously haven't used one yet"...
They were already bossy enough... 'MAKE A LEGAL U-TURN NOW!'"
Not really. My 1983 Datsun would talk, but it couldn't converse. Alexa and Siri couldn't hold a conversation anywhere near the level KITT did. There's a big difference. With LLMs, we're getting close.
That brings back some memories. My friend and I messed around with S.A.M. on his Atari 800 a lot when we were kids. We would crank call the parents of other kids we knew and have SAM tell them their kids had skipped school and might get suspended. It was funny to some twelve year olds anyway.
SAM had a basic mode where you just type English, but it also had an advanced phonetic input mode where you could control the sound and stress on every syllable. My favorite thing to do was try to give SAM a British accent.
Oh really? What vehicle can I buy today, drive home, get twice the legal limit drunk, flop in the back alone to take a nap while my car drives me two hours away to a relative's house?
I'd really like to buy that car so I await your response.
That's a jurisdiction problem, not a technology problem. No tech is foolproof, but even with the current technology someone would be much safer (for others, too) in the back seat than trying to drive tired, borderline DUI at night in unfamiliar town. Which many folks regularly do, for example on business travel.
The reason I cannot do this today is laws, not technology. My 2c.
The claim is that self driving is mundane - something everyone can have if they want. A standard feature, so entwined in the background of life that it is unremarkable.
Given that there is no system out there that I can own, jump in the back of in no condition to drive, and get to my destination safely defeats that claim. It's not even so mundane that everyone has the anemic Tesla self-driving feature that runs over kids and slams into highway barriers.
It may also be a matter of laws, but the underlying tech is also still not there given all the warnings any current "self driving car" systems give about having to pay attention to the road and keep your hands on the wheel even if the laws weren't there.
Could I get behind the wheel of my self driving car, drunk, and make it there safely? No, I definitely couldn't, and I understand why those laws exist with all of the existing failure modes of self driving cars.
People have called the current state of LLMs "sparkling AutoComplete". The current state of "self-driving cars" is "sparkling lane assist" with a chaser of adaptive cruise control.
You can do all that in a Waymo except for the “buy” part. When asked about that Sergey said “why do you want to own a car? You have to maintain it, insure it, park it at home and at work. Don’t you really just want to get where you’re going and have someone else figure out the rest?”
This was back before google ate the evil pill. Now their philosophy is more like “don’t fall asleep, we can get a good deal on your kidneys, after that we’ll sell your mom’s kidneys too”
No need for an RPi 5. Back in 1982, a dual or quad-CPU X-MP could have run a small LLM, say, with 200–300K weights, without trouble. The Crays were, ironically, very well suited for neural networks, we just didn’t know it yet. Such an LLM could have handled grammar and code autocompletion, basic linting, or documentation queries and summarization. By the late 80s, a Y-MP might even have been enough to support a small conversational agent.
A modest PDP-11/34 cluster with AP-120 vector coprocessors might even have served as a cheaper pathfinder in the late 70s for labs and companies who couldn't afford a Cray 1 and its infrastructure.
But we lacked both the data and the concepts. Massive, curated datasets (and backpropagation!) weren’t even a thing until the late 80s or 90s. And even then, they ran on far less powerful hardware than the Crays. Ideas and concepts were the limiting factor, not the hardware.
A "small Large Language Model", you say? So a "Language Model"? ;-)
> Such an LLM could have handled grammar and code autocompletion, basic linting, or documentation queries and summarization.
No, not even close. You're off by 3 orders of magnitude if you want even the most basic text understanding, 4 OOM if you want anything slightly more complex (like code autocompletion), and 5–6 OOM for good speech recognition and generation. Hardware was very much a limiting factor.
I would have thought the same, but EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second. The X-MP was in the same ballpark, with the added benefit of native vector processing (not just some extension bolted onto a scalar CPU) which performs very well on matmul.
John Carmack was also hinting at this: we might have had AI decades earlier, obviously not large GPT-4 models but useful language reasoning at a small scale was possible. The hardware wasn't that far off. The software and incentives were.
> EXO Labs showed otherwise by getting a 300K-parameter LLM to run on a Pentium II with only 128 MB of RAM at about 50 tokens per second
50 token/s is completely useless if the tokens themselves are useless. Just look at the "story" generated by the model presented in your link: Each individual sentence is somewhat grammatically correct, but they have next to nothing to do with each other, they make absolutely no sense. Take this, for example:
"I lost my broken broke in my cold rock. It is okay, you can't."
Good luck tuning this for turn-based conversations, let alone for solving any practical task. This model is so restricted that you couldn't even benchmark its performance, because it wouldn't be able to follow the simplest of instructions.
You're missing the point. No one is claiming that a 300K-param model on a Pentium II matches GPT-4. The point is that it works: it parses input, generates plausible syntax, and does so using algorithms and compute budgets that were entirely feasible decades ago. The claim is that we could have explored and deployed narrow AI use cases decades earlier, had the conceptual focus been there.
Even at that small scale, you can already do useful things like basic code or text autocompletion, and with a few million parameters on a machine like a Cray Y-MP, you could reasonably attempt tasks like summarizing structured or technical documentation. It's constrained in scope, granted, but it's a solid proof of concept.
The fact that a functioning language model runs at all on a Pentium II, with resources not far off from a 1982 Cray X-MP, is the whole point: we weren’t held back by hardware, we were held back by ideas.
Llama 3 8B took 1.3M hours to train in a H100-80GB.
Of course, it didn't took 1.3M hours (~150 years). So, many machines with 80GB were used.
Let's do some napkin math. 150 machines with a total of 12TB VRAM for a year.
So, what would be needed to train a 300K parameter model that runs on 128MB RAM? Definitely more, much more than 128MB RAM.
Llama 3 runs on 16GB VRAM. Let's imagine that's our Pentium II of today. You need at least 750 times what is needed to run it in order to train it. So, you would have needed ~100GB RAM back then, running for a full year, to get that 300K model.
How many computers with 100GB+ RAM do you think existed in 1997?
Also, I only did RAM. You also need raw processing power and massive amounts of training data.
You’re basically arguing that because A380s need millions of liters of fuel and a 4km runway, the Wright Flyer was impossible in 1903. That logic just doesn’t hold. Different goals, different scales, different assumptions.
The 300K model shows that even in the 80s, it was both possible and sufficient for narrow but genuinely useful tasks.
We simply weren’t looking, blinded by symbolic programming and expert systems. This could have been a wake-up call, steering AI research in a completely different direction and accelerating progress by decades. That’s the whole point.
"I mean, today we can do jet engines in garage shops. Why would they needed a catapult system? They could have used this simple jet engine. Look, here is the proof, there's a YouTuber that did a small tiny jet engine in his garage. They were held back by ideas, not aerodynamics and tooling precision."
See how silly it is?
Now, focus on the simple question. How would you train the 300K model in 1997? To run it, you someone to train it first.
Reductio ad absurdum. A 300K-param model was small enough to be trained offline, on curated datasets, with CPUs and RAM capacities that absolutely existed at the time, especially in research centers.
Backprop was known. Data was available. Narrow tasks (completion, summarization, categorization) were relevant. The model that runs on a Pentium II could have been trained on a Cray, or across time on any reasonably powerful 90s workstation. That’s not fantasy, LeNet 5 with its 65K weight was trained on a mere Sun station in the early 90s.
The limiting factor wasn’t compute, it was the conceptual framing as well as the datasets. No one seriously tried, because the field was dominated by symbolic logic and rule-based AI. That’s the core of the argument.
> In 1989 a recognizer as complex as LeNet-5 would have required several weeks’ training and more data than were available and was therefore not even considered.
Their own words seem to match my assessment.
Training time and data availability determined how much this whole thing could advance, and researchers were aware of those limits.
I think a quad-CPU X-MP is probably the first computer that could have run (not train!) a reasonably impressive LLM if you could magically transport one back in time. It supported a 4GB (512 MWord) SRAM-based "Solid State Drive" with a supported transfer bandwidth of 2 GB/s, and about 800 MFLOPS CPU performance on something like a big matmul. You could probably run a 7B parameter model with 4-bit quantization on it with careful programming, and get a token every couple seconds.
This sounds plausible and fascinating.
Let’s see what it would have taken to train a model as well.
Given an estimate of 6 FLOPs per token per parameter, training a 7B parameter model would require about 1.26×10^22 FLOPs. That translates to roughly 500 000 years on an 800 MFLOPS X-MP, far too long to be feasible.
Training a 100M parameter model would still take nearly 70 years.
However, a 7M-parameter model would only have required about six months of training, and a 14M one about a year, so let’s settle on 10 million. That’s already far more reasonable than the 300K model I mentioned earlier.
Moreover, a 10M parameter model would have been far from useless. It could have performed decent summarization, categorization, basic code autocompletion, and even powered a simple chatbot with a short context, all that in 1984, which would have been pure sci-fi back in those days. And pretty snappy too, maybe around 10 tokens per second if not a little more.
Too bad we lacked the datasets and the concepts...
I was reading lots of scifi in 1977, so I may have tried to talk to the pi like Scotty trying to talk to the mouse in Star Trek IV. And since you can run an LLM and text to speech on an RPi5, it might have answered.