Building scrapers sucks. It's generally not hard because it's conceptually very ...

atomic128 · 2024-09-02T23:19:52 1725319192

Another approach is to use a regexp scraper. These are very "loose" and tolerant of changes. For example, RNSAFFN.com uses regular expressions to scrape the Commitments of Traders report from the Commodity Futures Trading Commission every week.

simonw · 2024-09-02T23:22:14 1725319334

My experience has been the opposite: regex scrapers are usually incredibly brittle, and also harder to debug when something DOES change.

My preferred approach for scraping these days is Playwright Python and CSS selectors to select things from the DOM. Still prone to breakage, but reasonably pleasant to debug using browser DevTools.