Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LLMs suffer from the "Igon Value Problem" https://rationalwiki.org/wiki/Igon_Value_Problem

Similar to reading a pop sci book, you're getting an entertainment from a thing with no actual understanding of the source material rather than an education.



Earlier in this thread, people mention the counterpoint to this: they Google the information from the LLM and do more reading. It's an excellent starting point for researching a topic: you can't trust everything it says, but if you don't know where to start, it will very likely get you to a good place to start researching.

Similarly, while you can't fully trust everything a journalist says, it's obviously better to have journalism than to have nothing: the "Ikon Value Problem" doesn't mean that journalism should be eradicated. Pre-LLMs, we really had nothing like LLMs in this way.


> they Google the information from the LLM and do more reading

The runway on this one seems to be running out fast - how long before all the google results are also non-expert opinions regurgitated by LLMs?


People are forgetting about the content farms like Associate Content [1]. Since the early aughts, these content farms would happily produce expert-sounding content on anything that people were searching for. They would buy top search terms from search engines like Yahoo, hire English majors for dirt cheap, and have them produce "expert" content targeting those search terms. At least the LLMs have been trained on some relevant data!

[1] https://en.wikipedia.org/wiki/Yahoo_Voices


So with AI Google has cut out the middleman and insourced the content farm.


The way I see it they have been like that for at last a decade. Of course before the transformers revolution these were generated in a more crude way, but still the end result is 99% of Google results for any topic have been trash for me since early 200x.

Google has given up on fighting the SEO crowd long time ago. I worry they give up on the entire idea of search and will just serve answers from their LLM.


You can turn to actual experts, e.g. YouTube or books. But yes, I have recently had the misfortune of working with a personal trainer who was using ChatGPT to come up with training programs, and it felt confusing and like I was wasting time and money.


When I'm looking for actual experts, the first thing that comes to my mind is definitely YouTube!!

And least when it's about YouTube specific topics, like where the like button and the subscribe button is.

They will tell me. Every. Single. F*cking. 5. Minute. Clip. Again. And. Again.

Not soooo much for anything actually important or interesting, though.... ;)

PS: Also which of the always same ~5 shady companies their sponsor is, of course.


Unironically, youtube is a great place to find actual experts on a given subject.


But he explicitly mentions books. That contrast makes it interesting. I assume that he is explicitly fine with text content.

And then he does not mention the web in general (or even Reddit - it wouldn't be worth more than an eyeroll to me), but YouTube.

On the one hand, yeah, well, the web was probably in a better shape in the past. (And YT even is a major aspect of that, imho, but anyways...) On the other hand, you really must be a die hard YT fanatic to only mention that single website (which by the way is mostly video clips, and has all the issues of the entire web), instead of just the web.

It's really well outside of the sphere of my imagination. The root cause of my reply wasn't even disagreement at first, but surprise and confusion.


You've made an error here...

>They will tell me. Every. Single. F*cking. 5. Minute. Clip. Again. And. Again.

Do you know why you got that video. Because people liked and subscribed to them and the 'experts' with the best information in the universe are hidden 5000 videos below with 10 views.

And this is 100% Googles fault for the algorithms they created that force these behaviors on anyone that wants to use their platform and have visibility.

Lastly, if you can't find anything interesting or important on YT, this points at a failure of your own. While there is an ocean of crap, there is more than enough amazing content out there.


Yeah, well, I never said that there aren't any experts in any topic who at some point decided to publish something there. The fact that entire generations of human beings basically look there and at TikTok and Instagram for any topic, probably also helps with this decision. It's still wildly bizarre to me anyways when people don't mention the web in general in such a context, but one particular commercial website, which is a lot about video based attention economy (and rather classic economy via so-called influencers). Nothing of that sounds ideal to me when it comes to learning about actually useful topics from actual experts. Not even the media type. It's hard for them to hyperlink between content, it's hard for me to search, to skip stuff, reread a passage or two, choose my own speed for each section, etc, etc. Sure, you can find it somewhere there. In the same spirit, McD is a salad bar, though... ;)

> And this is 100% Googles fault for the algorithms they created that force these behaviors on anyone that wants to use their platform and have visibility.

Wrong assumptions. It's not their fault, and a lot of it is probably by intent. It's just that they and you are not in the same boat. You are the product at big tech sites. It's 100% (impersonally) your fault to be sooo resistant understanding that. ;)


LLMs are pretty good at attacking the "you don't know what you don't know" problem on a given topic.


You just state this as if it was obviously true, but I don't see how. Why is using LLM like reading a pop sci book and not like reading a history book? Or even less like either, because you have to continually ask questions to get anything?


A history book is written by someone who knows the topic, and then reviewed by more people who also know the topic, and then it's out there where people can read it and criticize it if it's wrong about the topic.

A question asked to an AI is not reviewed by anyone, and it's ephemeral. The AI can answer "yes" today, and "no" tomorrow, so it's not possible to build a consensus on whether it answers specific questions correctly.


A pop sci fi book can be written by someone who knows the topic and reviewed by people who know the topic — and a history book can also not.

LLM generated answers are more comparable to ad-hoc human expert's answers and not to written books. But it's much simpler to statistically evaluate and correct them. That is how we can know that, on average, LLMs are improving and are outperforming human experts on an increasing number of tasks and topics.


In my experience LLM generated answers are more comparable to an ad-hoc answer by a human with no special expertise, moderate google skills, but good bullshitting skills spending a few minutes searching the web, reading what they find and synthesizing it, waiting long enough for the details to get kind of hazy, and then writing up an answer off the top of their head based on that, filling in any missing material by just making something up. They can do this significantly faster than a human undergraduate student might be able to, so if you need someone to do this task very quickly / prolifically this can be beneficial (e.g. this could be effective for generating banter for video game non-player characters, for astroturfing social media, or for cheating on student essays read by an overworked grader). It's not a good way to get expert answers about anything though.

More specifically: I've never gotten an answer from an LLM to a tricky or obscure question about a subject I already know anything about that seemed remotely competent. The answers to basic and obvious questions are sometimes okay, but also sometimes completely wrong (but confidently stated). When asked follow-up questions the LLM will repeatedly directly contradict itself with additional answers each as wrong as the first, all just as confidently stated.


More like "have already skimmed half of the entire Internet in the past", but yeah. That's exactly the mental model IMO one should have with LLMs.

Of course don't forget that "writing up an answer off the top of their head based on that, filling in any missing material by just making something up" is what everyone does all the time, and in particular it's what experts do in their areas of expertise. How often those snap answers and hasty extrapolations turn out correct is, literally, how you measure understanding.

EDIT:

There's some deep irony here, because with LLMs being "all system 1, no system 2", we're trying to give them the same crutches we use on the road to understanding, but have them move the opposite direction. Take "chain of thought" - saying "let's think step by step" and then explicitly going through your reasoning is not understanding - it's the direct opposite of it. Think of a student that solves a math problem step by step - they're not demonstrating understanding or mastery of the subject. On the contrary, they're just demonstrating they can emulate understanding by more mechanistic, procedural means.


Okay, but if you read written work by an expert (e.g. a book published by a reputable academic press or a journal article in a peer-reviewed journal), you get a result whose details were all checked out, and can be relied on to some extent. By looking up in the citation graph you can track down their sources, cross-check claims against other scholars', look up survey sources putting the work in context, think critically about each author's biases, etc., and it's possible to come to some kind of careful analysis of the work's credibility and assess the truth value of claims made. By doing careful search and study it's possible to get to some sense of the scholarly consensus about a topic and some idea of the level of controversy about various details or interpretations.

If instead you are reading the expert's blog post or hastily composed email or chatting with them on an airplane you get a different level of polish and care, but again you can use context to evaluate the source and claims made. Often the result is still "oh yeah this seems pretty insightful" but sometimes "wow, this person shouldn't be speculating outside of their area of expertise because they have no clue about this".

With LLM output, the appropriate assessment (at least in any that I have tried, which is far from exhaustive) is basically always "this is vaguely topical bullshit; you shouldn't trust this at all".


I am just curious about this. You said the word never, and I think your claim can be tested, perhaps you could post a list of five obscure questions for a LLM to answer and then someone could ask that to a good LLM for you, or an expert in that field, to assess the value of the answers.

Edited: I just submitted an ASK HN post about this.


> I've never gotten an answer from an LLM to a tricky or obscure question about a subject I already know anything about that seemed remotely competent.

Certainly not my experience with the current SOTA. Without being more specific, it's hard to discuss. Feel free to name something that can be looked at.


The same is true of Google, no?


> A question asked to an AI is not reviewed by anyone, and it's ephemeral. The AI can answer "yes" today, and "no" tomorrow, so it's not possible to build a consensus on whether it answers specific questions correctly.

It's even more so with humans! Most of our conversations are, and has always been, ephemeral and unverifiable (and there's plenty of people who want to undo the little of permanence and verifiability we still have on the Internet...). Along the dimension of permanence and verifiability, asking an LLM is actually much better than asking a human - there's always a log of the conversation you had with the AI produced and stored somewhere for at least a while (even if only until you clear your temp folder), and if you can get ahold of that log, you can not just verify the answers, you can actually debug the AI. You can rerun the conversation with different parameters, different prompting, perhaps even inspect the inference process itself. You can do that ten times, hundred times, a million times, and won't be asked to come to Hague and explain yourself. Now try that with a human :).


The context of my comment was what is the difference between an AI and a history book. Or going back to the top comment, between an AI and an expert.

If you want to compare AI with ephemeral unverifiable conversations with uninformed people, go ahead. But that doesn't make them sound very valuable. I believe they are more valuable than that for sure, but how much, I'm not sure.


when i tried studying, i got really frustrated because i had to search for so many things and not a lot of people would explain basic math things to me in a simple way.

LLMs do already a lot better job at this. A lot faster, accurate enough and easy to use.

I can now study something alone which i was not able to do before.


> accurate enough

Ask it something non-trivial about a subject you are an expert in and get back to me.


Accurate enough for it to explain to me details of 101, 201 and 301 university courses in math or physics.

Besides, when i ask it about things like SRE, Cloud etc. its a very good starting point.


Sadly I lack expertise. Do you have any concrete examples? How does, say the Wiki entry on the topic compare to your expert opinion.


Oh so you mean I have at my fingertips a tool that can generate me a Scientific American issue on any topic I fancy? That's still some non-negative utility right there :).


A Scientific American issue where the authors have no idea that they don’t know a topic so just completely make up the content, including the sources. At least magazine authors are reading the sources before misunderstanding the content (or asking the authors what the research means).

I don’t even trust the summaries after watching LLMs think we have meetings about my boss’s cat just because I mentioned it once as she sniffed the camera…


Its good to not trust it but that's not the same as it having no idea. There is a lot of value in being close for many tasks!


I think it’s a very dangerous place to be in an area you’re not familiar with. I can read Python code and figure out if it’s what I want or not. I couldn’t read an article about physics and tell you what’s accurate and what’s not.

Legal Eagle has a great video on how ChatGPT was used to present a legal argument, including made up case references! Stuff like this is why I’m wary to rely on it in areas outside of my expertise.


There’s a world of difference between blindly trusting an LLM and using it to generate clues for further research.

You wouldn’t write a legal argument based on what some random stranger told you, would you?


> Oh so you mean I have at my fingertips a tool that can generate me a Scientific American issue on any topic I fancy?

I’m responding to this comment, where I think it’s clear that an LLM can’t event achieve the goal the poster would like.

> You wouldn’t write a legal argument based on what some random stranger told you, would you?

I wouldn’t but a lawyer actually went to court with arguments literally written by a machine without verification.


> I’m responding to this comment, where I think it’s clear that an LLM can’t event achieve the goal the poster would like.

I know it can't - the one thing it's missing is the ability to generate coherent and correct (and not ugly) domain-specific illustrations and diagrams to accompany the text. But that's not a big deal, it just means I need to add some txt2img and img2img models, and perhaps some old-school computer vision and image processing algos. They're all there at my fingertips too, the hardest thing about this is finding the right ComfyUI blocks to use and wiring them correctly.

Nothing in the universe says an LLM has to do the whole job zero-shot, end-to-end, in a single interaction.

> I wouldn’t but a lawyer actually went to court with arguments literally written by a machine without verification.

And surely a doctor somewhere tried to heal someone with whatever was on the first WebMD page returned by Google. There are always going to be lazy lawyers doctors doing stupid things; laziness is natural for humans. It's not a valid argument against tools that aren't 100% reliable and idiot-proof; it's an argument for professional licensure.


Your entire argument seems to be “it’s fine if you’re knowledgeable about an area,” which may be true. However, this entire discussion is in response to a comment who is explicitly not knowledgeable in the area they want to read about.

All the examples you give require domain knowledge which is the opposite of what OP wants, so I’m not sure what your issue is with what I’m saying.


> Its good to not trust it but that's not the same as it having no idea. There is a lot of value in being close for many tasks!

The task is to replace hazelcast with infinispan in a stand-alone IMDG setup. You're interested in Locks and EntryProcessors.

Ghat GPT 4, o1 tell you with their enthusiastic style Infinispan has all those features.

You test it locally and it does....

But the thing is infinispan doesn't have explicit locks in client-server mode, just in embedded mode, but that's something you find out from another human who has tied doing the same thing.

Are you better off using Chat GPT in this case?

I could go on and on and on, on times Chat GPT has bullshitted me and wasted days of my time, but hey, it helps with one-liners and Copilot occasionally has spectacular method auto-complete and learns on the fly some stuff and it makes my cry when it remembers random tidbits about me that not even family members do


Given I have never heard of any of {hazelcast, infinispan, IMDG, EntryProcessors}, even that kind of wrong would probably be a improvement by virtue of reducing the time I spend working on the wrong answer.

But only "probably" — the very fact that I've not heard of those things means I don't know if there's a potential risk from trying to push this onto a test server.

You do have a test server, and aren't just testing locally, right? Whatever this is?


>. You do have a test server, and aren't just testing locally, right? Whatever this is?

Of course I didn't test in a client-server setup, that's why chat gpt manage to fool me, because I know all those terms, and that was not the only alternative I looked up. Before trying Infinispan I tried Apache Ignite and the api was the same for client-server and embedded mode; in hazelcast the api was the same for client-server and embedded mode, so I just presumed it would be the same for Infinispan AND I had Chat GPT re-assuring me.

The takeaway about Chat GPT for me is -- if there's plenty of examples/knowledge out there, it's ok to trust it, but if you're pushing the envelope, the knowledge is obscure, not many examples, DO NOT TRUST it.

DO NOT assume that just because the information is in the documentation, chat GPT has the knowledge or insight and you can cut corners by asking chat GPT.

And it's not even obscure information -- we've asked Chat GPT about the behavior of PostgreSql batchupserts/locking and it also failed to understand how that works.

Basically, I cannot trust it on anything that's hard -- my 20 years of experience have made me weary of certain topics and whenever those come up, I KNOW that I don't know, I KNOW that that particular topic is tricky, obscure, niche and my output is low confidence, and I need to slow down.

The more you use Chat GPT, the more likely it will screw you over in subtle ways; I remember being very surprised about how could so very subtle bugs arise EXACTLY in the pieces of code I deemed very unlikely to need tests.

I know our interns/younger folks use it for everything and I just hope there's got to be some ways to profit from people mindlessly using it.


> There is a lot of value in being close for many tasks!

horseshoes and hand-grenades?


Yes. Despite this apparently popular saying, "close enough" is sufficient in almost everything in life. Usually it's the best you can get anyway - and this is fine, because on most things, you can also iterate, and then the only thing that matters is that you keep getting closer (fast enough to converge in reasonable time, anyway).

Where "close" does not count, it suggests there's some artificial threshold at play. Some are unavoidable, some might be desirable to push through, but in general, life sucks when you surround yourself or enforce artificial hard cut-offs.


I notice that you've just framed most knowledge creation/discovery as a form of gradient descent.

Which it is, of course.


So they have reached human level intelligence :D


Yes! But now you get a specific pop sci book _in any subject you want to learn about_ and _you can ask the book about comparisons_ (e.g. how were Roman and Parthian legal systems similar?). This at leas gives you a bunch of keywords to go silly in wikipedia and publications (sci-hub! Cough! Sci-hub!)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: