Yes, a lot of those have pretty egregious annotation mistakes. Once you get in high percentage it's often worth going through your dataset with your model prediction and compare. Obviously you can't do that on academic benchmarks (though some papers still do).