https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro
Are you discounting all of the self reported scores?
I don't understand why "TIGER-LAb"-sourced scores are 'unknown' in terms of model size?
https://huggingface.co/spaces/TIGER-Lab/MMLU-Pro
Are you discounting all of the self reported scores?