Phi-3 blow this out of the water. Benchmark | Gemma 2 (9B) | Phi-3 Small (7B) --...

ferretj · on June 27, 2024

Another take on this: phi-3 small has 1100 ELO on LMSYS (ranked #52) while the confidence interval for Gemma 2 9B is [1170, 1200] ELO (ranked btw #15 and #25).

moffkalast · on June 27, 2024

Phi is notorious for benchmark overfitting. It's good, but not as good as it looks on the charts. On the Lmsys leaderboard it places a whole 23 spots behind Llama-3-8B which it also claims to soundly beat on the above. So YMMV.

Garcia98 · on June 27, 2024

Pretraining on the Test Set Is All You Need

https://arxiv.org/abs/2309.08632