"But that doesn't make sense because Sally can't be her own sister."
Having said this, how many 'lines' of reasoning does the average human need? It's a weird comparison perhaps but the point is does it really matter if it needs 100 or 100k 'lines', if it could hide that (just as we hide our thoughts or even can't really access the - semi-parallel - things our brain does to come to an answer) eventually and summarise it + give the correct answer, that'd be acceptable?
The implementation details don't matter. LLMs not being able to properly reason though is a fundamental limitation and no amount of re-running will help.
In fairness it actually works out the correct answer fairly quickly (20 lines, including a false start and correction thereof). It seems to have identified (correctly) that this is a tricky question that it is struggling with so it does a lot of checking.
You’ll get some eye rolls from people when you bring that up for Sally.
But there are several names that used to be considered male but are now female, like Leslie and Marion. I don’t think I’ve ever met a man name Marion, but you still occasionally run into a Leslie.
It would be interesting to start using Leslie for this little logic puzzle and see how that affects people’s answers.
>Sally has three brothers. Each brother has two sisters. How many sisters does sally have?
Here is the answer: https://pastebin.com/JP2V92Kh