> All evals are 0 shot My bet is that this is the reason they are scoring high i... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		YetAnotherNick on March 16, 2024 \| parent \| context \| favorite \| on: EagleX 1.7T: Soaring past LLaMA 7B 2T in both Engl... > All evals are 0 shot My bet is that this is the reason they are scoring high in "their" benchmarks. For model which are just trained on completely unlabelled data like llama, 0 shot won't work well. e.g. For llama Hellaswag accuracy is 57.13% in their benchmark compared to 78.59% in [1]. [1]: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

veselin on March 18, 2024 [–]

I think this is simply the default of lm-evaluation-harness. They said they ran every single benchmark they could out of the box.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact