Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Where did you get the top ten from?

https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro

Are you discounting all of the self reported scores?



Came here to say this. It's behind the 14b Phi-reasoning-plus (which is self-reported).

I don't understand why "TIGER-LAb"-sourced scores are 'unknown' in terms of model size?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: