I manage a few thousand client sites spread across a bunch of differing infrastructures but one common thing we have on all of them is all the specific robots rules to block ai crawlers.
Not a single one of them respects that request anymore. Anthropic is the worst at the moment. It starts hitting the site with an anthropic useragent. You can litterally see it hit the robots file, then it changes it's useragent to a generic browser one and carries on.
I say at the moment because they're all doing this kind of crap, they seem to take it in turns at ramping up hammering servers.
> Not a single one of them respects that request anymore
AI companies have seen the prisoner's dilemma of good internet citizenship and slammed hard on the "defect" button. They plan to steal your content, put it in an attribution-removing blender, then sell it to other people. With the explicit purpose of replacing human contribution on the internet and in the workplace. After all, they represent a reality-distorting amount of capital, so why should any rules apply to them?
Not a single one of them respects that request anymore. Anthropic is the worst at the moment. It starts hitting the site with an anthropic useragent. You can litterally see it hit the robots file, then it changes it's useragent to a generic browser one and carries on.
I say at the moment because they're all doing this kind of crap, they seem to take it in turns at ramping up hammering servers.