you know everything is made up right? And yet it just works. I too use a confidence score in an bug finder app, Github seems to use them in copilot reviews, people will use them until it is shown not to work anymore.
> Sadly, this also failed. The LLMs judgment of its own output was nearly random. This also made the bot extremely slow because there was now a whole new inference call in the workflow.
on the other hand this post https://www.greptile.com/blog/make-llms-shut-up says that it didn't work in their case:
> Sadly, this also failed. The LLMs judgment of its own output was nearly random. This also made the bot extremely slow because there was now a whole new inference call in the workflow.