Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was thinking the same about consistency, which is useful for many contexts, but if you let another LLM do the extraction, you are again at the mercy of that LLMs quality/hallucinations.

Of course mitigation strategies were using many different LLMs and comparing results (voting) or using a highly trained/specialised model for only entity / context extraction. An interesting benchmark would be when those extraction techniques/models would exceed what a human professional is able to do.



> you are again at the mercy of that LLMs quality/hallucinations.

I guess I'm fine with a tall error bar - better than nothing.

> An interesting benchmark would be when those extraction techniques/models would exceed what a human professional is able to do.

Slashdot-style +1 Funny, -1 Inconsistent for RLHF maybe...?


Another that I don't see enough is just resampling and showing the human more 'final answers' if they are easy and quick to check for correctness.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: