Hacker News new | past | comments | ask | show | jobs | submit login

I know a lot of FOSS people are hostile to AI in general, and this is an immediate problem, but I feel like a better solution for everyone would be for there to be some sort of central repo of this information that AI companies can pull from without externalizing their costs like this.



Are you suggesting that everyone move their projects to a single code forge (GitHub)?

Also, isn't this basically just extortion? "I know you're minding you're own business FOSS maintainer, but move your code to our recommended forge instead so we can stop DDoSing you?"


I'm mostly suggesting a mirror to some centralized location. GitHub is probably a good place to mirror to for code, but it could be elsewhere.

I was actually thinking of a more general thing than just code, eg similar to CommonCrawl, but maybe a code specific thing is what is needed.


Isn't this still similar to extortion? Maintainers aren't creating the problem. They are minding their own business until scrapers come along and make too many unnecessary requests. Seems like the burden is clearly on the scrapers. They could easily be hitting the pages much less often for a start.

Doesn't your suggestion shift the responsibility to likely under-sponsored FOSS maintainers rather than companies? Also, how do people agree to switch to some centralized repository and how long does that take? Even if people move over, would that solve the issue? How would a scraper know not to crawl a maintainer's site? Scrapers already ignore robots.txt, so they'd probably still crawl even if you verified you've uploaded the latest content.


Scrapers still have an economic incentive to do what is easiest. Providing an alternative that is easier than fighting sysadmin blocks would likely cause them to take the easier route and make it less of a cat and mouse game for sysadmins.


Scrapers have an incentive to take all data.

If you put some data in a central repository, they will take it.

Then they will go and DDoS the rest of the Internet in order to take all the rest of the data.


Why buy the cow when you get the milk for free tho?


Because sysadmins are taking separate actions to make the milk expensive.


> hostile to AI in general

What?! "AI"?!?! We are talking about traffic abusers!...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: