Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Specifically, you have no way of knowing the difference between accurate outputs and inaccurate outputs, without running the command yourself, making it largely worthless.

The former is necessarily the case given the Halting Problem; the latter is falsified by the fact we can reason about code despite the Halting Problem.



> the latter is falsified by the fact we can reason about code despite the Halting Problem.

I'm not talking in general terms, or describing 'what the code does' in a summary or bullet point high-level form. No one is arguing that it can't summarize and describe what code does. These models are very good at that.

I'm talking specifically about generating the output of the command, as the OP specifically mentioned.

It does generate the exact output for commands and scripts if you request it, sometimes even if you don't, just as an example; they're just, often, hallucinated rubbish.

Being impressed that GPT can invent from 'thin air' some creative writing (fiction) when you tell it `pretend you're a docker container and now run 'ls'` is I feel, missing the boat, in terms of understanding or being impressed by the capabilities of these LLMs.


And they're often not, even for quite complex functions that requires symbolically executing quite a few calculations to get to the pre-requisite output.

Nobody is impressed that it "can invent from 'thin air' some creative writing (fiction)", but that it often does not and in fact produces correct output. You're right we can't rely on it producing the correct output as it currently stands, but that it is capable of doing this at all is impressive.


>> without running the command yourself,

> the latter is falsified by the fact we can reason about code despite the Halting Problem

i think wokwokwok's point holds true in practice.

Our patience and working-memory is far more limited than what is essential to accurately model all the necessary details of even moderately complex algorithms in our head.

One of the main reasons to limit code-complexity to improve readability/maintainability.

https://en.wikipedia.org/wiki/Cyclomatic_complexity


Can you explain how the halting problem applies here?


Suspect this is a troll posting, but on the off chance I'm wrong... The LLM gives the output of a command. To do so, it has to be able to determine when the command exits. This is exactly the halting problem.

For a trivial example, what is the output of:

```

while True:

  pass

print("goodby world") ```

(this is also proof that leaving out the curly braces makes code harder instead of simpler #python-lie-to-me. multiple edits to get this to render correctly on HN )


Rice's theorem may actually be more appropriate here (but it's a consequence of the halting theorem).

But it's important to note that just because there's no algorithm that works on ALL programs doesn't mean that the semantic properties of all programs are undecidable. Clearly for the particular programs where the program is bounded and guaranteed to terminate (e.g. no unbounded loops or recursion allowed) we can determine such properties, and I believe theorem provers in fact only allow such programs. And similarly you can restrict yourself to only the programs that you can prove will terminate in N steps (which might be excluding some programs that do terminate but require more than N steps of compute to prove).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: