130T unique pages? That seems highly unlikely as that averages to over 10000 pages for each human being alive. If gp merely wants texts of interest to self as opposed to an accurate snapshot it seems LLMs should be quite capable, one day.
It doesn't seem that hard to believe given how much automatically generated "content" (mostly garbage) there is.
I think a more interesting question is how much information there is on the internet, especially after optimal compression. I'm guessing this is a very difficult question to answer, but also much higher than LLMs currently store.
Tweets count. HN posts also count (actually as high quality texts :). IOT devices reporting status based on the same templates should not qualify as unique pages (count the number of templates if you want). Now if I do a search for some news, many almost verbatim copies would show up. They should only count as one, as we are looking for unique texts!