You joke but this is unreasonably effective. We're prototyping using LLms to extract, among other things, names from arbitrary documents.
Asking the LLm to read the text and output all the names it found -> it gets the names but there's lots of false positives.
Asking the LLM to then classify the list of candidate names it found as either name / not name -> damn near perfect.
Playing around with it it seems that the more text it has to read the worse it performs at following instructions so having low accuracy pass on a lot of text followed by a high-accuracy pass on a much smaller set of data is the way to go.
What's your false negative rate? Also, where does it occur,is it the first LLM that omits names, or the second LLM that incorrectly classify words as "not a name" when it is in fact a name?
Asking the LLm to read the text and output all the names it found -> it gets the names but there's lots of false positives.
Asking the LLM to then classify the list of candidate names it found as either name / not name -> damn near perfect.
Playing around with it it seems that the more text it has to read the worse it performs at following instructions so having low accuracy pass on a lot of text followed by a high-accuracy pass on a much smaller set of data is the way to go.