>> It is already demonstrating general capabilities and performing a wide range ...

simonw · 2025-04-20T18:22:50 1745173370

You can argue that everything output by an LLM is hallucinated, since there's no difference under-the-hood between outputting useful information and outputting hallucinations.

The quality of the LLM then becomes how often it produces useful information. That score has gone up a lot in the past 18 months.

(Sometimes hallucinations are what you want: "Tell me a fun story about a dog learning calculus" is a valid prompt which mostly isn't meant to produce real facts about the world")

codr7 · 2025-04-20T19:00:08 1745175608

Isn't it the case that the latest models actually hallucinate more than the ones that came before? Despite best efforts to prevent it.

simonw · 2025-04-20T20:01:48 1745179308

The o3 model card reports a so far unexplained uptick in hallucination rate from o1 - on page 4 of https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f372...

That is according to one specific internal OpenAI benchmark, I don't know if it's been replicated externally yet.

bbor · 2025-04-20T18:01:21 1745172081

The critical discovery was a way to crack the “Frame Problem”, which roughly comes down to colloquial notions of common sense or intuition. For the first time ever, we have models that know if you jump off a stool, you will (likely!) be standing on the ground afterwards.

In that sense, they absolutely know things that aren’t in their training data. You’re correct about factual knowledge, tho — that’s why they’re not trained to optimize it! A database(/pagerank?) solves that problem already.