And they do have (the same) robots.txt on every domain, tailored for GPTbot, i.e...

AgentME · on April 11, 2024

All the lines related to GPTBot are commented out. That robots.txt isn't trying to block it. Either it has been changed recently or most of this comment thread is mistaken.

Pannoniae · on April 11, 2024

It wasn't commented out a few hours ago when I checked it. I think that's a recent change.

cwillu · on April 11, 2024

Accessing a directly referenced page is common in order to receive the noindex header and/or meta tag, whose semantics are not implied by “Disallow: /”

And then all the links are to external domains, which aren't subject to the first site's robots.txt

andybak · on April 11, 2024

This is a moderately persuasive argument.

Although the crawler should probably ignore all the html body. But it does feel like a grey area if I accept your first pint.

kubanczyk · on April 12, 2024

You've been able to convince me to accept his second pint. Friday it is.

fsckboy · on April 11, 2024

humans don't read/respect robots.txt, so in order to pass the Turing test, ai's need to mimic human behavior.

gunapologist99 · on April 11, 2024

This must be why self-driving cars always ignore the speed limit. ;)

microtherion · on April 11, 2024

More directly, e.g. Tesla boasts of training their FSD on data captured from their customer's unassisted driving. So it's hardly surprising that it imitates a lot of humans' bad habits, e.g. rolling past stop lines.

roughly · on April 11, 2024

Jesus, that’s one of those ideas that looks good to an engineer but is why you really need to hire someone with a social sciences background (sociology, anthropology, psychology, literally anyone who’s work includes humans), and probably should hire two, so the second one can tell you why the first died of an aneurism after you explained your idea.

yreg · on April 11, 2024

AI DRIVR claims that beta V12 is much better precisely because it takes rules less literally and drives more naturally.