Where were these data sourced from? As far as I know, StackOverflow does not pub...

encomiast · on July 25, 2023

Users with over 25,000 reputation can see site analytics that include charts that look a lot like these. https://stackoverflow.com/help/privileges/site-analytics

fhoffa · on July 25, 2023

Looking at that source, there's a pretty straightforward explanation of where the page views come from.

- Last month, Stack Overflow had ~142,575,642 visits.

- Last month, Google gave 127,896,508 visits to Stack Overflow

- Last month, Bing gave 7,491,274 visits to Stack Overflow

You could say that the Stack Overflow # of pageviews depends mainly on:

- How often people are searching Google/Bing for answer.

- How often Google/Bing rank Stack Overflow high enough for people to click into it.

MontyCarloHall · on July 25, 2023

That must be it. It would have been nice for the original post to include this information.

XCSme · on July 25, 2023

I can confirm: https://i.snipboard.io/0wmnSv.jpg

XCSme · on July 25, 2023

We can also see that lately there are more questions than answers, which shows that most experts are no longer that active, or that there are more beginners and fewer experts overall.

dleeftink · on July 25, 2023

Besides the Archive.org data dump, there is also the Stack Exchange Data Explorer for which there are thousands of user queries[1].

For instance, this user query by Starball tracks network contributions over time[2][3].

[1]: https://data.stackexchange.com/meta.stackexchange/queries

[2]: https://data.stackexchange.com/meta.stackexchange/query/1759...

[3]: Static image if the query times out: https://i.stack.imgur.com/LYZQm.png

minimaxir · on July 25, 2023

Stack Overflow used to release their data archives quarterly on BigQuery. Looking at the BQ datasets, they were last updated Nov 2022, which doesn't have the latest 2023 info in the submission.

fhoffa · on July 25, 2023

FWIW, I now analyze the Stack Overflow dumps on Snowflake

https://medium.com/snowflake/how-to-load-the-stack-overflow-...

dleeftink · on July 25, 2023

Thanks for sharing, good to see alternative options popping up. My wish is that the Stack Exchange dataset could one day be provided as a streaming parquet or arrow table, as underfunded grads and post-grads could then more easily/selectively sample the datasets (similar to how Huggingface provides some of its datasets)[1][2].

The Hugginface repo unfortunately prefilters some of the tables/rows according to some criteria, making it less usable for general analytical queries that the BQ or SEDE datasets enable. If anyone knows of an 'XML-streaming' solution that directly samples from the Internet Archive's data dumps, I am all ears.

[1]: https://huggingface.co/docs/datasets-server/rows

[2]: https://huggingface.co/datasets/HuggingFaceGECLM/StackExchan...