The improvement over ChatGPT are counted in (very) few percents. Does it mean th...

code51 · on Dec 6, 2023

> We’re already starting to experiment with Gemini in Search, where it's making our Search Generative Experience (SGE) faster for users, with a 40% reduction in latency in English in the U.S., alongside improvements in quality.

This feels like Google achieved a more efficient inference. Probably a leaner model wrt GPT.

tkellogg · on Dec 6, 2023

not sure, but you could also look at the inverse. e.g. a 90% to 95% improvement could also be interpreted as 10% failure to 5% failure, i.e. half the amount of failures, a very big improvement. It depends on a lot of things, but it's possible that this could feel like a very big improvement.

logicchains · on Dec 6, 2023

Training large language models is characterised by diminishing returns; the first billion training inputs reduce the loss more than the second billion, the second billion reduce the loss more than the third, etc. Similar for increases in size; the improvement is less than linear.

dragonwriter · on Dec 6, 2023

It may mean that the evaluations useful range of distinguishing inprovements is limited. If its a 0-100 score on defined sets of tasks that were set because they were hard enough to distinguish quality in models a while back, the rapid rate of improvement may mean that they are no longer useful in distinguishing quality of current models even aside from the problem that it is increasingly hard to stop the actual test tasks from being reflected in training data in some form.

krona · on Dec 6, 2023

Wouldn't 95% vs 90% mean 2x better, not 5% better?

sodality2 · on Dec 6, 2023

Depends on if you mean "better" as better score (5% better) or "better" as in "fewer errors" (100% better).

HarHarVeryFunny · on Dec 6, 2023

Probably just reflects that they are playing catch-up with OpenAI, and it would not look good if they announced their latest, greatest (to be available soon) was worse that what OpenAI have been shipping for a while, so I assume that being able to claim superiority (by even the smallest amount) over GPT-4 was the gating factor for the this announcement.

I doubt LLMs are close to plateauing in terms of performance unless there's already an awful lot more to GPT-4's training than is understood. It seems like even simple stuff like planning ahead (e.g. to fix "hallucinations", aka bullshitting) is still to come.

hackerlight · on Dec 7, 2023

They want to release immediately to please shareholders but only if they're beating SOTA in benchmarks. Therefore we will usually get something which beats SOTA by a little bit, because the alternative (aside from a huge breakthrough) would be to delay release longer which serves no business purpose.

Kichererbsen · on Dec 6, 2023

isn't that the definition of diminishing returns? just asking - that's how I always interpreted that phrase...