Where were these data sourced from? As far as I know, StackOverflow does not publicize internal analytics with such granularity. If these figures are real, are they leaked?
We can also see that lately there are more questions than answers, which shows that most experts are no longer that active, or that there are more beginners and fewer experts overall.
Stack Overflow used to release their data archives quarterly on BigQuery. Looking at the BQ datasets, they were last updated Nov 2022, which doesn't have the latest 2023 info in the submission.
Thanks for sharing, good to see alternative options popping up. My wish is that the Stack Exchange dataset could one day be provided as a streaming parquet or arrow table, as underfunded grads and post-grads could then more easily/selectively sample the datasets (similar to how Huggingface provides some of its datasets)[1][2].
The Hugginface repo unfortunately prefilters some of the tables/rows according to some criteria, making it less usable for general analytical queries that the BQ or SEDE datasets enable. If anyone knows of an 'XML-streaming' solution that directly samples from the Internet Archive's data dumps, I am all ears.