I disagree. Claude may fail at running a vending machine business but I have use...

tyre · 2025-07-16T05:16:45 1752643005

For example, UnitedHealth buried in its financials that it hit its numbers by exiting equity positions.

It then _didn’t_ include a similar transaction (losing $7bn by exiting Brazil).

This was stuck in footnotes that many people who follow the company didn’t pick up.

tough · 2025-07-16T19:51:34 1752695494

how would someone using an LLM to explore the reports find such a thing

Uehreka · 2025-07-16T21:07:29 1752700049

This is why it’s important to follow the studies comparing LLMs’ performance in “needle-in-a-haystack” style tasks. They tend to be pretty good at finding the one thing wrong in a large corpus of text, though it depends on the LLM, the flavor (Sonnet, Opus, 8B, 27B, etc) and the size of the corpus, and there are occasional performance cliffs.

belter · 2025-07-16T13:49:27 1752673767

Did you go and look at the correctness of the information?

Because I have seen Claude, as recently as a week ago, completely inventing and citing whole non existent paragraphs from the documentation of some software I know well. I only because of that, I was able to notice...

ffsm8 · 2025-07-16T18:45:52 1752691552

All models hallucinate. The likelihood of hallucinations are however strongly influenced by the way you prompt and construct your context.

But even if a human went through the documents by hand and tried to make the analysis, they're still likely to make mistakes. That's why we usually define the scientific method as making falsifiable claims, which you then try to disprove in order to make sure they're correct.

And if you can't do that, then you're always walking on thin ice, whatever tool or methodology you choose to use for the analysis.

belter · 2025-07-17T16:02:32 1752768152

> hallucinations are however strongly influenced by the way you prompt and construct your context.

Show me the research supporting this argument. So far RAG and similar approaches is what limits hallucinations.

ffsm8 · 2025-07-19T11:54:47 1752926087

Are you serious unaware what a RAG is and still speak with authority on the topic?

It's automatically retrieving information and adding it to the context. It's -in spirit- a convenience function so you don't have to manually provide it during the prompt. It's just a lot harder to pull off well automatically, but the fundamental practice is "just" context optimization

You're essentially saying "but that's not driving!!!!" After someone goes by in an EV, because it's ain't an ICE

belter · 2025-07-19T14:41:09 1752936069

Not the same: "RAG vs. Long-context LLMs" - https://www.superannotate.com/blog/rag-vs-long-context-llms

ffsm8 · 2025-07-19T16:51:39 1752943899

You're literally linking to an article that confirmed what I said. Yes, a model that has RAG will be able to perform with a lot smaller context size.

That doesn't mean RAG isn't context optimization.

belter · 2025-07-20T14:47:02 1753022822

Did you made any technical argument on how to reduce hallucinations? Because I fail to see one from you on this thread except: "it's the fault of your prompt""

v5v3 · 2025-07-16T09:11:45 1752657105

> I had an accounting professor that used to say the secret was reading (and understanding) the footnotes.

He must have passed this secret knowledge on, as they all say it now...

BenGosub · 2025-07-16T05:40:45 1752644445

It's mostly good, but one mistake can burn you severely.

graemep · 2025-07-18T12:15:27 1752840927

A good bit of old advice is to read the notes first.

tough · 2025-07-16T19:51:11 1752695471

would anyone pay for an llm that can parse 10k reports hallucination free?

was exploring this idea recently maybe I should ship it

bboygravity · 2025-07-16T21:14:51 1752700491

Grok 4 SuperHeavy can almost certainly do this out of the box?

tough · 2025-07-17T00:42:47 1752712967

I haven't tried SuperHeavy, but why would it? all transformer based LLM's are pretty prone to hallucinations even with RAG... it can be pretty good I guess

any articles to learn more about it?