Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure, a non-human's performance "should" be capped at ~50% for a large sample size. I think seeing a much higher percentage, like 73%, indicates systematic error in the interrogator. This -- the fact that humans are not good at detecting genuine human behaviour -- is really a problem in the Turing test itself, but I don't see a good way to solve it.

LLaMa 3.1 with the same prompt "only" managed to be judged human 56% of the time, so perhaps it's actually closer to real human behaviour.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: