> If the verification systems for LLMs are not built out of LLMs and they're somehow more robust than LLMs at human-language problem solving and analysis, then you should be using the technology the verification system uses instead of LLMs in the first place!
The issue is not in the verification system, but in putting quantifiable bounds on your answer set. If I ask an LLM to multiply large numbers together I can also very easily verify the generated answer by topping it with a deterministic function.
I.e. rather than hoping that an LLM can accurately multiply two 10 digit numbers, I have a much easier (and verified) solution by instead asking it to perform this calculation using python and reading me the output
The issue is not in the verification system, but in putting quantifiable bounds on your answer set. If I ask an LLM to multiply large numbers together I can also very easily verify the generated answer by topping it with a deterministic function.
I.e. rather than hoping that an LLM can accurately multiply two 10 digit numbers, I have a much easier (and verified) solution by instead asking it to perform this calculation using python and reading me the output