How do you know GPT-4 is 1 shot? The details about it aren't released, it is entirely possible it does stuff in multiple stages. Why wouldn't OpenAI use their most powerful version to get better stats, especially when they don't say how they got it?
Google being more open here about what they do is in their favor.
There's a rumour that GPT-4 runs every query either 8x or 16x in parallel, and then picks the "best" answer using an additional AI that is trained for that purpose.
It would have to pick each token then, no? Because you can get a streaming response, which would completely invalidate the idea of the answer being picked after.
It's false, it's the 9 months-down-the-line telephone game of a unsourced rumor re: mixture of experts model. Drives me absolutely crazy.
Extended musings on it, please ignore unless curious about evolution patterns of memes:
Funnily enough, it's gotten _easier_ to talk about over time -- i.e. on day 1 you can't criticize it because it's "just a rumor, how do you know?" -- on day 100 it's even worse because that effect hasn't subsided much, and it spread like wildfire.
On day 270, the same thing that gave it genetic fitness, the alluring simplicity of "ah yes, there's 8x going on", has become the core and only feature of the Nth round of the telephone game. There's no more big expert-sounding words around it that make it seem plausible.
Same way I know the latest BMW isn't running on a lil nuke reactor. I don't, technically. But there's not enough comment room for me to write out the 1000 things that clearly indicate it. It's a "not even wrong" question on your part
Google being more open here about what they do is in their favor.