Don't look at absolute number, instead think of it in terms of relative improvem...

machiaweliczny · on Dec 6, 2023

This + some benchmarks are shitty thus rational model should be allowed to not answer them but ask claryfying questions.

belval · on Dec 6, 2023

Yes, a lot of those have pretty egregious annotation mistakes. Once you get in high percentage it's often worth going through your dataset with your model prediction and compare. Obviously you can't do that on academic benchmarks (though some papers still do).