Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It gets the Sally question correct, but it takes more than 100 lines of reasoning.

>Sally has three brothers. Each brother has two sisters. How many sisters does sally have?

Here is the answer: https://pastebin.com/JP2V92Kh



This line had me laughing:

"But that doesn't make sense because Sally can't be her own sister."

Having said this, how many 'lines' of reasoning does the average human need? It's a weird comparison perhaps but the point is does it really matter if it needs 100 or 100k 'lines', if it could hide that (just as we hide our thoughts or even can't really access the - semi-parallel - things our brain does to come to an answer) eventually and summarise it + give the correct answer, that'd be acceptable?



The implementation details don't matter. LLMs not being able to properly reason though is a fundamental limitation and no amount of re-running will help.


In fairness it actually works out the correct answer fairly quickly (20 lines, including a false start and correction thereof). It seems to have identified (correctly) that this is a tricky question that it is struggling with so it does a lot of checking.


> Let me check online for similar problems.

And finally googles the problem, like we do :)


It seems obvious to me that she has one sister. Or is that the naive, wrong answer?


While Sally is usually girl's name, the question never states that. So Sally could be actually a boy and in that case Sally would have two sisters.


You’ll get some eye rolls from people when you bring that up for Sally.

But there are several names that used to be considered male but are now female, like Leslie and Marion. I don’t think I’ve ever met a man name Marion, but you still occasionally run into a Leslie.

It would be interesting to start using Leslie for this little logic puzzle and see how that affects people’s answers.


Fair enough.


brilliant


overthinking is also a problem o1 struggles with


I should have read the blog post. This is a known issue:

>Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.

So the 100 lines was probably not necessary.


I don't think this question is super hard. ChatGPT 4o mini gets this one correct consistently without being asked to reason step by step.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: