Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I disagree. Claude may fail at running a vending machine business but I have used it to read 10k reports and found it to be really good. There is a wealth of information in public filings that is legally required to be accurate but is often obfuscated in footnotes. I had an accounting professor that used to say the secret was reading (and understanding) the footnotes.

That’s a huge pain in the neck if you want to compare companies, worse if they are in different regulatory regimes. That’s the kind of thing I have found LLMs to be really good for.



For example, UnitedHealth buried in its financials that it hit its numbers by exiting equity positions.

It then _didn’t_ include a similar transaction (losing $7bn by exiting Brazil).

This was stuck in footnotes that many people who follow the company didn’t pick up.

https://archive.ph/fNX3b


how would someone using an LLM to explore the reports find such a thing


This is why it’s important to follow the studies comparing LLMs’ performance in “needle-in-a-haystack” style tasks. They tend to be pretty good at finding the one thing wrong in a large corpus of text, though it depends on the LLM, the flavor (Sonnet, Opus, 8B, 27B, etc) and the size of the corpus, and there are occasional performance cliffs.


Did you go and look at the correctness of the information?

Because I have seen Claude, as recently as a week ago, completely inventing and citing whole non existent paragraphs from the documentation of some software I know well. I only because of that, I was able to notice...


All models hallucinate. The likelihood of hallucinations are however strongly influenced by the way you prompt and construct your context.

But even if a human went through the documents by hand and tried to make the analysis, they're still likely to make mistakes. That's why we usually define the scientific method as making falsifiable claims, which you then try to disprove in order to make sure they're correct.

And if you can't do that, then you're always walking on thin ice, whatever tool or methodology you choose to use for the analysis.


> hallucinations are however strongly influenced by the way you prompt and construct your context.

Show me the research supporting this argument. So far RAG and similar approaches is what limits hallucinations.


Are you serious unaware what a RAG is and still speak with authority on the topic?

It's automatically retrieving information and adding it to the context. It's -in spirit- a convenience function so you don't have to manually provide it during the prompt. It's just a lot harder to pull off well automatically, but the fundamental practice is "just" context optimization

You're essentially saying "but that's not driving!!!!" After someone goes by in an EV, because it's ain't an ICE


Not the same: "RAG vs. Long-context LLMs" - https://www.superannotate.com/blog/rag-vs-long-context-llms


You're literally linking to an article that confirmed what I said. Yes, a model that has RAG will be able to perform with a lot smaller context size.

That doesn't mean RAG isn't context optimization.


Did you made any technical argument on how to reduce hallucinations? Because I fail to see one from you on this thread except: "it's the fault of your prompt""


> I had an accounting professor that used to say the secret was reading (and understanding) the footnotes.

He must have passed this secret knowledge on, as they all say it now...


It's mostly good, but one mistake can burn you severely.


A good bit of old advice is to read the notes first.


would anyone pay for an llm that can parse 10k reports hallucination free?

was exploring this idea recently maybe I should ship it


Grok 4 SuperHeavy can almost certainly do this out of the box?


I haven't tried SuperHeavy, but why would it? all transformer based LLM's are pretty prone to hallucinations even with RAG... it can be pretty good I guess

any articles to learn more about it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: