I have a pet theory that there are two forms of the web: the document web and th...

marginalia_nu · on June 15, 2023

For application websites like the ones you listed, you'd typically end up building a special integration for crawling against their API or data dumps. This is also true for github, stackoverflow, and even document:y websites like wikipedia.

It's simply not feasible to treat them as any other website if you wanna index their data.