I was thinking the same about consistency, which is useful for many contexts, bu...

baq · 2025-04-10T08:29:13 1744273753

> you are again at the mercy of that LLMs quality/hallucinations.

I guess I'm fine with a tall error bar - better than nothing.

> An interesting benchmark would be when those extraction techniques/models would exceed what a human professional is able to do.

Slashdot-style +1 Funny, -1 Inconsistent for RLHF maybe...?

Tostino · 2025-04-10T08:34:25 1744274065

Another that I don't see enough is just resampling and showing the human more 'final answers' if they are easy and quick to check for correctness.