Hacker News new | past | comments | ask | show | jobs | submit login

And they do have (the same) robots.txt on every domain, tailored for GPTbot, i.e. https://petra-cody-carlene.web.sp.am/robots.txt

So, GPTBot is not following robots.txt, apparently.




All the lines related to GPTBot are commented out. That robots.txt isn't trying to block it. Either it has been changed recently or most of this comment thread is mistaken.


It wasn't commented out a few hours ago when I checked it. I think that's a recent change.


Accessing a directly referenced page is common in order to receive the noindex header and/or meta tag, whose semantics are not implied by “Disallow: /”

And then all the links are to external domains, which aren't subject to the first site's robots.txt


This is a moderately persuasive argument.

Although the crawler should probably ignore all the html body. But it does feel like a grey area if I accept your first pint.


You've been able to convince me to accept his second pint. Friday it is.


humans don't read/respect robots.txt, so in order to pass the Turing test, ai's need to mimic human behavior.


This must be why self-driving cars always ignore the speed limit. ;)


More directly, e.g. Tesla boasts of training their FSD on data captured from their customer's unassisted driving. So it's hardly surprising that it imitates a lot of humans' bad habits, e.g. rolling past stop lines.


Jesus, that’s one of those ideas that looks good to an engineer but is why you really need to hire someone with a social sciences background (sociology, anthropology, psychology, literally anyone who’s work includes humans), and probably should hire two, so the second one can tell you why the first died of an aneurism after you explained your idea.


AI DRIVR claims that beta V12 is much better precisely because it takes rules less literally and drives more naturally.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: