What astounds me is there are no readily available libraries crawler authors can...

JimDabell · 2025-07-17T05:47:42 1752731262

robots.txt support is built into the Python stdlib as urllib.robotparser: https://docs.python.org/3/library/urllib.robotparser.html

rel=nofollow is a bad name. It doesn’t actually forbid following the link and doesn’t serve the same purpose as robots.txt.

The problem it was trying to solve was that spammers would add links to their site anywhere that they could, and this would be treated by Google as the page the links were on endorsing the page they linked to as relevant content. rel=nofollow basically means “we do not endorse this link”. The specification makes this more clear:

> By adding rel="nofollow" to a hyperlink, a page indicates that the destination of that hyperlink should not be afforded any additional weight or ranking by user agents which perform link analysis upon web pages (e.g. search engines).

> nofollow is a bad name […] does not mean the same as robots exclusion standards

— https://microformats.org/wiki/rel-nofollow

yodon · 2025-07-17T06:15:39 1752732939

Thanks for this!

micromacrofoot · 2025-07-17T18:35:06 1752777306

The "good" bot writers rarely have enough resources to demolish servers blindly, and are generally more careful whether or not you make it easier, so there's not much incentive.

codingminds · 2025-07-17T05:50:26 1752731426

I don't see a reason why a good bot operator couldn't build a parser lib in a different language and put it on a public repo.

Shouldn't be that hard if someone WANT to be good.

elric · 2025-07-17T06:47:38 1752734858

Sure, but it's always easier to use a tool that's been tried and tested.