Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Tweets alone generate petabytes of data a year

Nope. It's not Tweets that generate that data. It's the insane amount of (mostly unnecessary) noise that gets thrown into the mix: analytics, logs, metrics, you name it.

Every time you scroll Twitter sends multiple events to the server. That alone will generate a large chunk of those petabytes.



No, that's the second link - generated data, separate from tweets.

Tweets alone generate petabytes of data a year.

https://ankush-chavan.medium.com/twitter-data-storage-and-pr...

Also, many people would disagree that stuff required to run a business is "mostly unnecessary".


No, they don't. In spite of the confusing wording in the post you cite, its petabytes/year claim is not derived from the 500m tweets/day claim – it must include metadata and/or multimedia.

This was all already derived (correctly) in the original post. Recapitulating:

500m tweets/day * (conservatively) 512B/tweet * 365 days/yr ~= 90 TiB/yr

Assuming compression and variable-length encoding of this long tail in colder storage, it's more likely <20 TiB/yr (<=115B/tweet on average)

Yes, this excludes analytics metadata, which as you suggest would not support Twitter's current ad products. But your core repeated claim about tweets alone is two orders of magnitude off.


> 500m tweets/day * (conservatively) 512B/tweet * 365 days/yr ~= 90 TiB/yr

I wonder if the "Petabytes" figure being claimed includes pictures/videos that can be attached to a Tweet. In that case, I could easily see "Petabytes/year" be accurate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: