This paper the article refers to is fantastic! I think it's a work most in ML research should become familiar with. And if you believe in the power of benchmarks and data, then this holds even more true. Investing in diversity in datasets is likely an impactful way to make progress in AI/ML.
Minor typo in this article...
ARTICLE: Among their findings – based on core data from the Facebook-led community project Papers With Code (PWC) – the authors contend that ‘widely-used datasets are introduced by only a handful of elite institutions’, and that this ‘consolidation’ has increased to 80% in recent years.
...but right after, they quote the paper and clearly it is 50% not 80%. See the quote from the paper:
PAPER: ‘[We] find that there is increasing inequality in dataset usage globally, and that more than 50% of all dataset usages in our sample of 43,140 corresponded to datasets introduced by twelve elite, primarily Western, institutions.’
...and the article is leaving out this relevant quote from the paper:
PAPER: Moreover, this concentration on elite institutions as measured through Gini has increased to over 0.80 in recent years (Figure 3 right red). This trend is also observed in Gini concentration on datasets in PWC more generally (Figure 3 right black).
...and in general the article is right that inequality is increasing over time, but Gini is a specific metric to measure inequality, and 0.80 is not the same as 80% inequality.
Minor typo in this article...
ARTICLE: Among their findings – based on core data from the Facebook-led community project Papers With Code (PWC) – the authors contend that ‘widely-used datasets are introduced by only a handful of elite institutions’, and that this ‘consolidation’ has increased to 80% in recent years.
...but right after, they quote the paper and clearly it is 50% not 80%. See the quote from the paper:
PAPER: ‘[We] find that there is increasing inequality in dataset usage globally, and that more than 50% of all dataset usages in our sample of 43,140 corresponded to datasets introduced by twelve elite, primarily Western, institutions.’
...and the article is leaving out this relevant quote from the paper:
PAPER: Moreover, this concentration on elite institutions as measured through Gini has increased to over 0.80 in recent years (Figure 3 right red). This trend is also observed in Gini concentration on datasets in PWC more generally (Figure 3 right black).
...and in general the article is right that inequality is increasing over time, but Gini is a specific metric to measure inequality, and 0.80 is not the same as 80% inequality.