The hardest part about making a new architecture is that even if it is just bett...

p1esk · 2025-12-07T19:36:31 1765136191

Until google puts in a lot of resources into training a scaled up version of this architecture

If Google is not willing to scale it up, then why would anyone else?

8note · 2025-12-08T01:44:48 1765158288

chatgpt is an example on why.

falcor84 · 2025-12-08T15:23:39 1765207419

You think that this might be another ChatGPT/Docker/Hadoop case, where Google comes up with the technology but doesn't care to productize it?

tyre · 2025-12-07T20:11:35 1765138295

Google is large enough, well-funded enough, and the opportunity is great enough to run experiments.

You don't necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?

swatcoder · 2025-12-07T20:48:32 1765140512

Do you think there might be an approval process to navigate when experiments costs might run seven or eight digits and months of reserved resources?

While they do have lots of money and many people, they don't have infinite money and specifically only have so much hot infrastructure to spread around. You'd expect they have to gradually build up the case that a large scale experiment is likely enough to yield a big enough advantage over what's already claiming those resources.

dpe82 · 2025-12-08T08:01:09 1765180869

I would imagine they do not want their researchers unnecessarily wasting time fighting for resources - within reason. And at Google, "within reason" can be pretty big.

howdareme · 2025-12-08T10:34:35 1765190075

I mean looking antigravity, jules & gemini cli, they have have no problem with their developers fighting for resources

nl · 2025-12-08T11:45:49 1765194349

I mean you'd think so, but...

> In fact, the UL2 20B model (at Google) was trained by leaving the job running accidentally for a month.

https://www.yitay.net/blog/training-great-llms-entirely-from...

nickpsecurity · 2025-12-07T22:52:25 1765147945

But, it's companies like Google that made tools like Jax and TPU's saying we can throw together models with cheap, easy scaling. Their paper's math is probably harder to put together than an alpha-level prototype which they need anyway.

So, I think they could default on doing it for small demonstrators.

m101 · 2025-12-08T00:15:56 1765152956

Prove it beats models of different architectures trained under identical limited resources?

UltraSane · 2025-12-07T17:11:18 1765127478

Yes. The path dependence for current attention based LLMs is enormous.

patapong · 2025-12-07T19:10:11 1765134611

At the same time, there is now a ton of data for training models to act as useful assistants, and benchmarks to compare different assistant models. The wide availability and ease of obtaining new RLHF training data will make it more feasible to build models on new architectures I think.