It's just one of the low hanging AI fruits we can pick now, nothing unexpected.
We're doing similar things to find errors in our regular ML datasets - a large proportion of the examples the model can't predict are mislabeled. Those mislabeled examples have a big penalty on performance. Since Wikipedia is often used in ML, it was time to clean it up.
We're doing similar things to find errors in our regular ML datasets - a large proportion of the examples the model can't predict are mislabeled. Those mislabeled examples have a big penalty on performance. Since Wikipedia is often used in ML, it was time to clean it up.