Given your set of theoreticals then I would concede, yes the model is reasoning....

ninetyninenine · 2025-02-11T12:15:47 1739276147

https://news.ycombinator.com/item?id=43011839

cristiancavalli · 2025-02-11T14:55:56 1739285756

This looks neat but I don’t think it meets the standard for “reasoning only.” (Still not sure how you would prove that one) furthermore this looks to be fairly generalizable in pattern+form to other grid problems so i don’t think it also meets the bar for “not being in the training data.” We known these models can generalize somewhat based upon their training but not consistently and certainly not consistently well. Again I’m not making the claim that responding to a novel prompt is a sign of reasoning as other have pointed out a calculator can do that too.

Your quote: “This is a unique problem I came up with. It’s a variation on counting islands.” You then say: “ as I came up with it so no variation of it really exists anywhere else.”

So not sure what to take away from your text but I do think this is a variation of a well-known problem type so I would be pretty amazed if there was something very close to this in the training data. Given it’s an interview question and those are written about ad-nauseum I’m not surprised then that it was able to generalize to the provided case. The COT researchers did see the ability to generalize in some cases just not necessarily actually use the COT tokens to reason and/or failed on generalizing on variations which they thought it should have given its ability to generalize in others and the postulation that it was using reasoning and not just a larger corpus to pattern match with.

ninetyninenine · 2025-02-11T15:24:59 1739287499

It’s a variation on a well known problem in the sense that I just added some unique rules to it.

The solution however is not a variation. It requires leaps of creativity that most people will be unable to solve. In fact I would argue this goes beyond just reasoning as you have to be creative and test possibilities to even arrive at a solution. It’s almost random chance that will get you there. Simple reasoning like logical reduction won’t let you arrive at a solution.

Additionally this question was developed to eliminate pattern matching that candidates use on software interviews. It was vetted and verified to not exist. No training data exists.

It definitively requires reasoning to solve. And it is also unlikely you solved it. ChatGPT o3 has solved it. Try it.

cristiancavalli · 2025-02-11T16:13:05 1739290385

I did and I fail to see how you can make those guarantees given you given it as a n interview question? You’re able to the vet the training data of O3? I still don’t see how your answer could only be arrived at via reasoning and that it would take “leaps of creativity” to arrive at the correct answer? These all seem like value judgments not hard data or some proof that your question cannot be derived from the training data given you say it is a variation of. Seems like you have an interview question not “proof of reasoning” especially given the prior cited case of these models being able to generalize in some cases with enough data.

“And it is also unlikely you solved it” well I guess you overestimated your abilities on two counts today then.

> It’s a variation on a well known problem in the sense that I just added some unique rules to it.

> No training data exists.

No it definitely does but is a variation. You kinda just confirmed what we already knew. Given enough data about a thing these LLMs can generalize somewhat.

ninetyninenine · 2025-02-11T17:10:01 1739293801

I don’t think you solved it otherwise you’d know that what I mean by variation is similar to how calculus is a variation of addition. Yea it involves addition but the solution is far more complicated.

Think of it like this counting islands exists in the training data in the same way addition exists. The solution to this problem builds off of counting islands in the same way calculus builds off of addition.

No training data exists for it to copy because this problem is uniquely invented by me. The probability that it has is quite low. Additionally several engineers and I have done extensive google searches and we believe to a reasonable degree that this problem does not exist anywhere else.

Also you use semantics to cover up your meaning. LLMs can “generalize” somewhat? Generalization is one of those big words that’s not well defined. First off the solution is not trivially extracted from counting islands and second “generalize” is a form of reasoning. You’re using big fuzzy words with biased connotations to further your argument. But here’s the thing, even if we generously go with it, the solution to counting donuts is clearly not some trivial generalization of counting islands. The problem is a variation but the solution is NOT. It’s not even close to what we term as the colloquial definition of “generalization”

Did you solve it? I highly doubt you did. It’s statistically more likely you’re lying, and the fact that you call the solution a “generalization” just makes suspect that even more.

cristiancavalli · 2025-02-11T18:11:05 1739297465

Yep and yep. Did it on two models and by myself — you know if you ask them to cite similar problems (and their sources) I’ll think you’ll quickly realize how derivative your question is in both question and solution. Given that you’re now accusing me of arguing in bad faith despite the fact I’ve listened to you repeat the same point with the only proof being “this question is a head scratcher for me; must be for everyone else therefore it proves that one must reason” makes me think you don’t actually want to discuss something; you think you can “prove” something and seem to be more interested in that. Given that I say go publish your paper about your impossible question and let the rest of the community review it if you feel like you need to prove something. So far the only thing you’ve proven to me is that you’re not interested in a good-faith discussion; just repeating your dogma and hoping someone concedes.

Also generalization is not always reasoning: I can make a generalization that is not reasoned; I can also make one that is poorly reasoned. Generalization is considered well-defined in regards to reasoning: https://www.comm.pitt.edu/reasoning

Your example still fails to actually demonstrate reasoning given its highly derivative nature, though.

ninetyninenine · 2025-02-11T19:33:38 1739302418

Yeah I know you claimed to solve it. I’m saying I don’t believe you and I think you’re a liar. There’s various reasons why the biggest one is that you think the solution is “generalizable” from counting islands (it’s not).

That’s not the point though. The point is I have metrics on this. Roughly 50 interviews only one guy got it. So you make the claim the solution is generalize-able well then prove your claim then. I have metrics that support my claim. Where’s yours?