Indeed, a quick lookup doesn't give many reliable-sounding sources but they're all on the order of zettabytes (tens to thousands of them), also for years before any LLM was halfway usable. One has to wonder how much of that is generated, thinking of point of my own websites where the pages are derived statistics from player highscores, or the websites that jokingly index all Bitcoin addresses and UUIDs
Perhaps the 50TB estimate is unique information without any media or so, but OP can back up where they got that number from than I can do with guesswork