> And where that another LLM will get the capability to judge consistency? Had fun recently prompting different models with "a story about everyday life of <some stereotypical character>".
> Absolutely no awareness about anything spatial - what is in which place, who is where, body positions.
These are some things I'd like to at least attempt to quantify - LLMs are better at translation than thinking[0], so the idea was to translate the slop into something that can be then analyzed by a non-LLM tool. E.g. translate the story into a prolog fact database and generate a couple queries for each paragraph/chapter/combination of these. Rough idea, just something I haven't seen done.
These are some things I'd like to at least attempt to quantify - LLMs are better at translation than thinking[0], so the idea was to translate the slop into something that can be then analyzed by a non-LLM tool. E.g. translate the story into a prolog fact database and generate a couple queries for each paragraph/chapter/combination of these. Rough idea, just something I haven't seen done.
[0] don't @ me