If your browser doesn't play nicely and obey robots.txt when its headless I don'...

fbouvier · 2025-01-24T19:16:29 1737746189

Every tool can be used in a good or bad way, Chrome, Firefox, cURL, etc. It's not the browser who doesn't play nicely, it's the user.

It's the user's responsibility to behave well, like in life :)

hansvm · 2025-01-25T06:00:41 1737784841

The first thing that came to mind when I saw this project wasn't scraping (where I'd typically either want a less detectible browser or a more performant option), but as a browser engine that's actually sane to link against if I wanted to, e.g., write a modern TUI browser.

Banning the root library (even if you could with UA spoofing and whatnot) is right up there with banning Chrome to keep out low-wage scraping centers and their armies of employees. It's not even a little effective also risks significant collateral damage.

slt2021 · 2025-01-24T20:00:17 1737748817

it is trivial to spoof user-agent, if you want to stop a motivated scraper, you need a different solution that exploits the fact that robots use headless browser

sangnoir · 2025-01-24T20:12:13 1737749533

> it is trivial to spoof user-agent

It's also trivial to detect spoofed user agents via fingerprinting. The best defense against scrapers is done in layers, with user-agent name block as the bare minimum.