Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This has been happening already. The market is trying really hard to price out web scraping through scraper detection technologies and it's kinda working - scraping is becoming non-existent in user-space apps. It's also extremely discriminatory. Try running a single scrape with a developing country's IP and Linux, you'll be blocked at TLS step lol


> The market is trying really hard to price out web scraping... scraping is becoming non-existent in user-space apps

Uhh... Those two matters are pretty much unrelated to each other. Scraping is becoming non-existing because the era of static web pages has ended. No need to "scrap" when you have a nice, performant JSON REST API provided for you.


SSG vs SSR really has nothing to do with whether an API exists to provide the data you would otherwise need to scrape.

When was the last time you saw a site with a JSON API providing metadata, like the json-ld for a product on an e-commerce site? Or an API just for the open graph data? How would you even discover these APIs for sites that you don't own?

It's also worth noting that very, very few JSON APIs today are actually REST. They rarely include all the context needed, and in general JSON is much less useful than XML when you're talking to other APIs that you don't own since JSON can't easily describe the shape and datatypes of the content.


> No need to "scrap" when you have a nice, performant JSON REST API provided for you.

There are no performant json rest APIs provided these days though. The days of public APIs are long gone.


HTML "APIs" weren't meant for public either.

In practice, if there is a mobile app, there is an API. Whether it's creators object to your usage is mostly their own problem.


But of course search engines are fine


Having your cake and eating it too is a natural goal of every business and honestly it was just a matter of time till web pages figured out they can have the benefits of public data and avoid the costs. Web scraping and botting is basically a solved problem too - just put a login gate for the data which allows you to legally litigate against scrapers and bots. Done. However, nobody wants to lose the benefits of public data so here we are.


I used to care about respecting robots.txt until it was clear that established search engines are fine but any newcomers can go right to hell.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: