What's the dataset size? If there's billions of pedestrians tagged and a couple hundreds are missing, would it actually have a big impact on the training? Also looks like a lot of the people in those images are off the road. What's the current standard in AV? Is everything tagged on the sidewalk? (Genuine questions)
If they are reporting a 33% image error rate, I would expect a large effect on accuracy no matter what the individual annotation error rates were.
I am unfamiliar with the detailed tagging standards but that also seems irrelevant until the more egregious problems are resolved. Get everything untagged first, then look for smaller scoped issues. And thanks Roboflow for doing whatever amount of this.
The dataset is 15,000 images. Breakdown of the number of labels per class (post fixes) is here: https://i.imgur.com/bOFkueI.png
Not all of the pedestrians and cyclists were on the sidewalk, no (eg the kid on his bike in the road and the lady with a stroller in a crosswalk).
I stuck with what it looked like the conventions of the original dataset were (all people labeled as pedestrians whether on the road or not). They just didn't do it very well or consistently.
I do also think that makes the most sense in this context; if you were building a self driving car this layer of the stack would want to know where the people are; higher up you can combine that with where you know the roads/crosswalks/stoplights, etc (and the delta of their position between frames) to make predictions about where they might go next so your car can act accordingly.
For example, a stationary pedestrian at a corner will probably cross the street when the light turns green; if you're turning you need to factor that in.