Would be interesting if they'd add another one on non-fiction creative writing. For example, turning a set of investigative notes and findings into a Pulitzer-prize winning article that wouldn't be out of place in a renowned, high-quality newspaper.
IME, for LLMs (just like humans) this skill doesn't necessarily correlate with fiction writing prowess.
This is probably harder to judge automatically (i.e. using LLMs) though, maybe that's why they haven't done it.
>This is probably harder to judge automatically (i.e. using LLMs) though, maybe that's why they haven't done it.
Absolutely.
Firstly, it is entirely possible if there are investigative notes and findings available on a real life event, it might very well have been trained on the actual article. If the model is trained on it, it might just replicate it.
Plus, this might expose how some LLMs can still cook up stuff even when given facts to rely on. Some of these are more notorious than others.
Like Perplexity a year plus ago did that quite a bit for me, anecdotally speaking. It has become a lot better though.
Then even writing what some might consider Pulitzer prize winning is a subjective task.
In my mind, I was thinking of giving a set of fictional findings, not realizing that would technically make it.. fiction!
I think it's fine to have fictional notes. It's still a very different task than e.g. writing a fantasy novel, which these benchmarks roughly are about. Instead, the task would be to turn a given set of facts on a real-world topic into a high-quality, serious article.
> Then even writing what some might consider Pulitzer prize winning is a subjective task.
This applies to these benchmarks for "short fantasy story" tasks all the same.
IME, for LLMs (just like humans) this skill doesn't necessarily correlate with fiction writing prowess.
This is probably harder to judge automatically (i.e. using LLMs) though, maybe that's why they haven't done it.