i just don't get why people use web scraping as a battleground for moral ethics ...

trog · on Aug 17, 2024

> Google does it nobody has a problem

Historically, when Google did it, they did it to create an index, which a lot of people found useful as a way to find information they were looking for. This used to mean people would come and visit your website, where they could engage with the website creator directly through a variety of different means.

Google doing it now to digest all the content and mulch it all together to return a regurgitated form of it is a very different proposition, and that is what people are annoyed about when "the little guys" (funny name for startups with multiple billions of dollars of raised capital) are doing the same thing.

For many it's not about "moral ethics", it's about actual survival. If nobody is visiting their website, nobody is buying their products or engaging with their community or whatever.

If you're scraping content for no other purpose than to mechanistically reword it for commercial purposes, then it's not really surprising that people have issues with it.

__loam · on Aug 17, 2024

You're taking someone else's labor and profiting off it, without any credit or compensation. To add insult to injury, the person you're scraping pays money to support your traffic. It's a one sided transaction.

9dev · on Aug 17, 2024

You can’t generalise that. Maybe I crawl to provide an annotated preview of their website, to make users of my application more likely to click the link and visit it? There are lots of ways in which crawling benefits everyone, it just requires some mutual respect.

deisteve · on Aug 17, 2024

We don't live in the 90s anymore. The bandwidth CPU cost is moot unless you are spinning up thousands of GPUs to render an HTML page.

Also the claims of someone else's labor and profiteering is exactly what Google does

__loam · on Aug 17, 2024

What Google does is mutually beneficial. Stealing content to reheat and serve in an LLM is very different.

deisteve · on Aug 18, 2024

Google that creates monopoly on top of the scraped data and uses third party content to do so?

LLM just places that ability at the ends of the user (local LLM strictly speaking)

__loam · on Aug 18, 2024

No it doesn't. The end state of a Google search is a human interacting directly with a site.