It gets the Sally question correct, but it takes more than 100 lines of reasonin...

icoder · 2024-11-29T09:03:44 1732871024

This line had me laughing:

"But that doesn't make sense because Sally can't be her own sister."

Having said this, how many 'lines' of reasoning does the average human need? It's a weird comparison perhaps but the point is does it really matter if it needs 100 or 100k 'lines', if it could hide that (just as we hide our thoughts or even can't really access the - semi-parallel - things our brain does to come to an answer) eventually and summarise it + give the correct answer, that'd be acceptable?

fennecfoxy · 2024-12-05T12:14:05 1733400845

The LLM: https://www.youtube.com/watch?v=-fC2oke5MFg&ab_channel=TraeB...

kaba0 · 2024-11-29T19:20:21 1732908021

The implementation details don't matter. LLMs not being able to properly reason though is a fundamental limitation and no amount of re-running will help.

roenxi · 2024-11-29T04:29:55 1732854595

In fairness it actually works out the correct answer fairly quickly (20 lines, including a false start and correction thereof). It seems to have identified (correctly) that this is a tricky question that it is struggling with so it does a lot of checking.

noisy_boy · 2024-11-29T04:42:42 1732855362

> Let me check online for similar problems.

And finally googles the problem, like we do :)

xanderlewis · 2024-11-29T08:43:47 1732869827

It seems obvious to me that she has one sister. Or is that the naive, wrong answer?

denzil · 2024-11-29T10:21:37 1732875697

While Sally is usually girl's name, the question never states that. So Sally could be actually a boy and in that case Sally would have two sisters.

dleary · 2024-11-29T20:32:25 1732912345

You’ll get some eye rolls from people when you bring that up for Sally.

But there are several names that used to be considered male but are now female, like Leslie and Marion. I don’t think I’ve ever met a man name Marion, but you still occasionally run into a Leslie.

It would be interesting to start using Leslie for this little logic puzzle and see how that affects people’s answers.

xanderlewis · 2024-11-29T12:13:25 1732882405

Fair enough.

ionwake · 2024-11-29T07:38:46 1732865926

brilliant

whimsicalism · 2024-11-29T16:29:27 1732897767

overthinking is also a problem o1 struggles with

fngjdflmdflg · 2024-11-29T06:38:10 1732862290

I should have read the blog post. This is a known issue:

>Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.

So the 100 lines was probably not necessary.

owenpalmer · 2024-11-29T05:19:20 1732857560

I don't think this question is super hard. ChatGPT 4o mini gets this one correct consistently without being asked to reason step by step.