Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

i just don't get why people use web scraping as a battleground for moral ethics

its bizarre just like equating copyright infringement to theft of property.

where does this moral high ground come from? nobody scraping is thinking "oh im so evil im scraping without respecting robot.txt and using residential ip addresses to bypass detection"

Google does it nobody has a problem but when the little guy does it suddenly they are an outlaw.



> Google does it nobody has a problem

Historically, when Google did it, they did it to create an index, which a lot of people found useful as a way to find information they were looking for. This used to mean people would come and visit your website, where they could engage with the website creator directly through a variety of different means.

Google doing it now to digest all the content and mulch it all together to return a regurgitated form of it is a very different proposition, and that is what people are annoyed about when "the little guys" (funny name for startups with multiple billions of dollars of raised capital) are doing the same thing.

For many it's not about "moral ethics", it's about actual survival. If nobody is visiting their website, nobody is buying their products or engaging with their community or whatever.

If you're scraping content for no other purpose than to mechanistically reword it for commercial purposes, then it's not really surprising that people have issues with it.


You're taking someone else's labor and profiting off it, without any credit or compensation. To add insult to injury, the person you're scraping pays money to support your traffic. It's a one sided transaction.


You can’t generalise that. Maybe I crawl to provide an annotated preview of their website, to make users of my application more likely to click the link and visit it? There are lots of ways in which crawling benefits everyone, it just requires some mutual respect.


We don't live in the 90s anymore. The bandwidth CPU cost is moot unless you are spinning up thousands of GPUs to render an HTML page.

Also the claims of someone else's labor and profiteering is exactly what Google does


What Google does is mutually beneficial. Stealing content to reheat and serve in an LLM is very different.


Google that creates monopoly on top of the scraped data and uses third party content to do so?

LLM just places that ability at the ends of the user (local LLM strictly speaking)


No it doesn't. The end state of a Google search is a human interacting directly with a site.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: