> And where that another LLM will get the capability to judge consistency? Had f...

> And where that another LLM will get the capability to judge consistency? Had fun recently prompting different models with "a story about everyday life of <some stereotypical character>". > Absolutely no awareness about anything spatial - what is in which place, who is where, body positions.

These are some things I'd like to at least attempt to quantify - LLMs are better at translation than thinking[0], so the idea was to translate the slop into something that can be then analyzed by a non-LLM tool. E.g. translate the story into a prolog fact database and generate a couple queries for each paragraph/chapter/combination of these. Rough idea, just something I haven't seen done.

[0] don't @ me