Hacker News new | past | comments | ask | show | jobs | submit login

We use Probability. Find a prompt that has a large range aka codomain. If it arrived at the correct answer then that the only possibility here is reasoning because the codomain is so large it cannot arrive there by random chance.

Of course make sure the prompt is unique such that it's not in the data and it's not doing any sort of "pattern matching".

So like all science we prove it via probability. Observations match with theory to a statistical degree.




Pardon my ignorance -- assuming that range and codomain are approximately equivalent in this context, how do you specify a prompt with a large codomain? Is there a canonical example of a prompt with a large codomain?

It seems to me that, in natural language, the size of the codomain is related to the specificity of the prompt. For instance, if the prompt is "We are going to ..." then the codomain is enormous. But if the prompt is "2 times 2 is..." the codomain is, mathematically, {4, four}, some series of 4 symbols, eg IIII, or some other representation of the concept of "4" (ie different base or language representations: 0x04, 0b100, quatro, etc).

But if this is the case, a broad codomain is approximately synonymous with "no correct answer" or "result is widely interpretable". Which implies that the larger the codomain the easier it is to claim an answer "correct" in context of the prompt.

How do you reconcile loose interpretability with statistical rigor?


You'll have to drop a bit of rigor here.

I ask the question, what is 2 * 2, which is an obviously loaded question that's pattern matched to death.

The LLM can answer "4" or "The answer is 4" of "looks like the answer is 4"

All valid answers but all the same. We count all 3 of those answers as just 4 out of the set of numbers. But we have to use our own language faculties to cut through the noise of the language itself.


> I ask the question, what is 2 * 2, which is an obviously loaded question that's pattern matched to death.

Yeah, that was my point. Small codomain -> easy to validate. Large codomain -> open to interpretation. You implied that to prove reasoning, pick a prompt with a large codomain and if the LLM answers with accurate precision, then viola, reasoning.

So my question was, can you give an example of a prompt with a high codomain that isn't subject to wide interpretation? It seems the wider the codomain the easier it is to say, "look! reasoning!"


Pick a prompt with a wide codomain but a single answer. That’s reasoning if it can get the answer right.


Your original claim was that an LLM can reason. And you say it can be proven by picking one of these prompts with a large codomain that has a precise answer which requires reason. If an LLM can come to a specific answer out of a huge codomain, and that answer requires reason, you claim that proves reasoning. Do I have that right?

So my question is, and has been these three replies: Can you give any example of one of these prompts?





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: