My mom told me yesterday that Paul Newman had massive problems with alcohol. I was somewhat skeptical, so this morning I asked ChatGPT a very simple question:
"Is Paul Newman known for having had problems with alcohol?"
All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:
"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."
There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.
I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].
I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.
Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.
This is less an LLM thing than an information retrieval question. If you choose a model and tell it to “Search,” you find citation based analysis that discusses that he indeed had problems with alcohol. I do find it interesting it quibbles whether he was an alcoholic or not - it seems pretty clear from the rest that he was - but regardless.
This is indicative of something crucial when placing LLMs into a toolkit. They are not omniscient nor are they deductive reasoning tools. Information retrieval systems are excellent at information retrieval and should be used for information retrieval. Solvers are excellent at solving deductive problems. Use them. The better they get at these tasks alone is cool but is IMO a parlor trick since we have nearly optimal or actually optimal techniques that don’t need an LLM. The LLM should use those tools.
So, click search next time you have an information retrieval question.
https://chatgpt.com/share/67f2dac0-3478-8000-9055-2ae5347037...
I realise your answer wasn't assertive, but if I heard this from someone actively defending AI it would be a copout. If the selling point is that you can ask these AIs anything then one can't retroactively go "oh but not that" when a particular query doesn't pan out.
My point is the opposite of this point of view. I believe generative AI is the most significant advance since hypertext and the overlay of inferred semantic relationships via pagerank etc. In fact the creation of hypertext and the toolchains around it led to this point at all - neural networks were understood at that point and transformer attention is just an innovation. It’s the collective human assembly of language and visual interconnected knowledge at a pan cultural and global scale that enabled the current state.
The abilities of LLM alone to do astounding natural language processing beyond the ability of anything prior by unthinkable Turing test passing miles. The fact it can reason abductively, which computing techniques to date have been unable to is amazing. The fact you can mix it with multimodal regimes - images, motion, virtually anything that can be semantically linked via language, is breathtaking. The fact it can be augmented with prior computing techniques - IR, optimization, deductive solvers, and literally everything we’ve achieved to date should give anyone knowledgeable of such things shivers for what the future holds.
But I would never hold that generative AI techniques are replacements for known optimal techniques. But the ensemble is probably the solution to nearly every challenge we face. When we hit the limits of LLMs today, I think, well, at least we already have grand master beating chess solvers and it’s irrelevant the LLM can’t directly. The LLM and other generative AI techniques in my mind are like gasses that fill through learned approximation the things we’ve not been able to solve directly, including the assembly of those solutions ad hoc. This is why since the first time BERT came along I knew agent based techniques were the future.
Right now we live at time like early hypertext with respect to AI. Toolchains suck, LLMs are basically geocities pages with “under construction” signs. We will go through an explosive exploration, some stunning insights that’ll change the basic nature of our shared reality (some wonderful some insidious), then if we aren’t careful - and we rarely are - enshitification at scale unseen before.
This is a bit of a strawman. There are certainly people who claim that you can ask AIs anything but I don't think the parent commenter ever made that claim.
"AI is making incredible progress but still struggles with certain subsets of tasks" is self-consistent position.
Any information found in a web search about Newman will be available in the training set (more or less). It's almost certainly a problem of alignment / "safety" causing this issue.
"Any information found in a web search about Newman will be available in the training set"
I don't think that is a safe assumption these days. Training modern LLM isn't about dumping in everything on the Internet. To get a really good model you have to be selective about your sources of training data.
They still rip off vast amounts of copyrighted data, but I get the impression they are increasingly picky about what they dump into their training runs.
There’s a simpler explanation than that’s that the model weights aren’t an information retrieval system and other sequences of tokens are more likely given the totality of training data. This is why for an information retrieval task you use an information retrieval tool similarly to how for driving nails you use a hammer rather than a screw driver. It may very well be you could drive the nail with the screw driver, but why?
You think that's a simpler explanation? Ok. I think given the amount of effort that goes into "safety" on these systems that my explanation is vastly more likely than somehow this information got lost in the vector soup despite being attached to his name at the top of every search result[0].
Except if safety blocked this, it would have also blocked the linked conversation. Alignment definitely distorts behaviors of models, but treating them as information retrieval systems is using a screw driver to drive nails. Your example didn’t refute this.
Though a local model I'm running (gemma-3-27b-it; https://huggingface.co/lmstudio-community/gemma-3-27b-it-GGU...) just told me various correct sounding bits about his history with alcohol (correctly citing his alma mater and first wife), but threw in:
"Sobriety & AA: Newman got sober in 1964 and remained so for the rest of his life."
Which doesn't check out. And it includes plausible but completely hallucinated URLs (as well as a valid biography.com URL that completely omits information about alcohol.)
Gemma 3 4B (QAT quant):
Yes, Paul Newman was indeed known to have struggled with alcohol throughout his life. While he maintained a public image of a charming, clean-cut star, he privately battled alcoholism for many years. He sought treatment in the late 1980s and early 1990s and was reportedly very open about his struggles and the importance of seeking help.
> Yes, Paul Newman was known to have struggled with alcohol at certain points in his life. In his early years, particularly during his time in the Navy and into his acting career, Newman admitted to heavy drinking. He was open about his fondness for beer and once jokingly referred to himself as a “functional alcoholic.” In a 1988 interview with The New York Times, he acknowledged that he had a period where he drank too much, stating, “I was a very good drinker. I could put it away.” ...
Does the as yet unwritten prequel of Idiocracy tell the tale of when we started asking Ai chat bots for facts and this was the point of no return for humanity?
I like that it's unmonetized, of course, but that's not why I use AI. I use AI because it's better at search. When I can't remember the right keywords to find something, or when the keywords aren't unique, I frequently find that web search doesn't return what I need and AI does.
It's impressive how often AI returns the right answer to vague questions. (not always though)
Can you blame the users for asking it, when everyone is selling that as a key defining feature?
I use it for asking - often very niche - questions on advanced probability and simulation modeling, and it often gets those right - why those and not a simple verifiable fact about one of the most popular actors in history?
I don’t know about Idiocracy, but something that I have read specific warnings about is that people will often blame the user for any of the tool’s misgivings.
Counterpoint: Paul Newman was absolutely a famous drunk, as evidenced by this Wikipedia page.* Any query for "paul newman alcohol" online will return dozens of reputable sources on the topic. Your post is easily interpretable as handwaving apologetics, and it gives big "Its the children who are wrong" energy.
How else does an LLM distinguish what is widely known, given there are no statistics collected on the general populations awareness of any given celebrities vices? Robo-apologetics in full force here.
This may have hit the nail on the head about the weaknesses of LLM's.
They're going to regurgitate something not so much based on facts, but based on things that are accessible as perceived facts. Those might be right, but they might be wrong also; and no one can tell without doing the hard work of checking original sources. Many of what are considered accepted facts, and also accessible to LLM harvesting, are at best derived facts, often mediated by motivated individuals, and published to accessible sources by "people with an interest".
The weightings used by any AI should be based on the facts, and not the compounded volume of derived, "mediated", or "directed" facts - simply, because they're not really facts; they're reports.
It all seems like dumber, lazier search engine stuff. Honestly, what do I know about Paul Newman? But, Joanne Woodward and others who knew and worked with him should be weighted as being, at least, slightly more credible that others; no matter how many text patterns "catch the match" flow.
> I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.
I think we'll have a term like we have for parents/grandparents that believe everything they see on the internet but specifically for people using LLMs.
Thats not really 'simple' for an LLM. This is a niche information about a specifc person, LLM's train on massive amount of data, the more a topic is being present in the data, the better will the answers be.
Also, you can/should use the "research" mode for questions like this.
The question is simple and verifiable - it is impressive to me that it’s not contained in the LLM’s body of knowledge - or rather that it can’t reach the answer.
This is niche in the grand scheme of knowledge but Paul Newman is easily one of the biggest actors in history, and the LLM has been trained on a massive corpus that includes references to this.
Where is the threshold for topics with enough presence in the data?
3-4 hours is enough time for It to have crawled the hacker news comments section. That's about the frequency the AI bots crawl my little out of the way blog.
Excluding the ones that do not support chat completions, all but one (qwen-qwq-32b) answered in the affirmative. The answer from qwen-qwq-32b said:
Paul Newman, the renowned actor and humanitarian, did not have a widely publicized
struggle with alcohol addiction throughout most of his life, but there were
specific instances that indicated challenges.
Using lack of progress in a specialized field as a barometer for overall progress is kind of silly. I just spent the last few days 'vibe coding' an application and I have to say that it's pretty remarkable how capable it is now relative to my experience last year.
It took three minutes for me to do the above from the time I created my API key to when I had an answer.
I find that everyone who replies with examples like this is an expert using expert skills to get the LLM to perform. Which makes me think why is this a skill that is useful to general public as opposed to another useful skill for technical knowledge workers to add to their tool belt?
I agree. But I will say that at least in my social circles I'm finding that a lot of people outside of tech are using these tools, and almost all of them seem to have a healthy skepticism about the information they get back. The ones that don't will learn one way or the other.
Yes, Paul Newman was known for being a heavy drinker, particularly of beer. 1 He acknowledged his high consumption levels himself.
1.
Review: Paul Newman memoir stuns with brutal honesty - AP News
apnews.com
While he maintained an incredibly successful career and public life, accounts and biographies note his significant alcohol intake, often describing it as a functional habit rather than debilitating alcoholism, although the distinction can be debated. He reportedly cut back significantly in his later years.
Yes, Paul Newman did experience significant struggles with alcohol. In his posthumously published memoir, The Extraordinary Life of an Ordinary Man, Newman candidly discusses his drinking habits and acknowledges his long-term battle with alcoholism. He describes himself as a "functioning alcoholic," a trait he noted was shared with his father. At one point, Newman was reported to consume a case of beer daily, followed by spirits, until he eventually gave up hard liquor.
These models are not reliable sources of information. They are either out of date, subject to hallucination, or just plain wrong for a variety of reasons. They are untrustworthy to ask facts like this.
I appreciate your consideration of a subjective question and how you explained it and understand these nuances. But please - do not trust chatgpt etc. I continue to be frustrated at the endless people claiming something is true from chatgpt. I support the conclusions of this author.
For questions like this, you need to tell the bot to search the web.
> Hi, can you search the web and find out if Paul Newman has any alcohol problems?
> Paul Newman, the acclaimed actor and philanthropist, had a well-documented fondness for alcohol, particularly beer. He was known to consume significant quantities, with some accounts noting he would drink a case of beer in a day. This level of consumption led some to describe him as a functioning alcoholic. His drinking habits reportedly caused strain in his marriage to Joanne Woodward. In the late 1970s, an incident occurred where Woodward did not allow Newman back into their house because of his drinking. He promised to give up hard liquor as a compromise, but continued to drink beer, sometimes consuming a case during dinner.
Perplexity had no problem with the question, replying in part, "Paul Newman is known to have struggled with alcohol throughout his life, as revealed in various accounts, including his own memoir and documentaries about his life...."
I don't think this reflects a flaw in LLM's at all. It is true that Paul Newman is not "known for" having alcohol problems. Paul Newman is known for acting, and salad dressing. Paul Newman has been "known to" have had alcohol problems, which formed as a question correctly one-shots accurate results for me across a few common LLMs. Someone like Betty Ford could be "known for" having alcohol problems. Paul Newman's alcohol problems are relatively specialized knowledge.
good point, though a great LLM answer would DWIM here and explain both interpretations as you have done here, as opposed to (what presumably happened) focus on one and ignore the other.
LLMs aren't good at being search engines, they're good at understanding things. Put an LLM on top of a search engine, and that's the appropriate tool for this use case.
I guess the problem with LLMs is that they're too usable for their own good, so people don't realizing that they can't perfectly know all the trivia in the world, exactly the same as any human.
A stochastic parrot with a sufficiently tiny residual error rate needs a stochastic model so precisely compressing the world and sophisticated decompression algorithms that it could be called reasoning.
Take two 4K frames of a falling vase, ask a model to predict the next token... I mean the following images. Your model now needs include some approximations of physics - and the ability to apply it correctly - to produce a realistic outcome. I'm not aware of any model capable of doing that, but that's what it would mean to predict the unseen with high enough fidelity.
I've always been amazed by this. I have never not been frustrated with the profound stupidity of LLMs. Obviously I must be using it differently because I've never been able to trust it with anything and more than half the time I fact check it even for information retrieval it's objectively incorrect.
If you got as far as checking the output it must have appeared to understand your question.
I wouldn't claim LLMs are good at being factual, or good at arithmetic, or at drawing wine glasses, or that they are "clever". What they are very good at is responding to questions in a way which gives you the very strong impression they've understood you.
I vehemently disagree. If I ask a question with an objective answer, and it simply makes something up and is very confident the answer is correct, what the fuck has it understood other than how to piss me off?
It clearly doesn't understand that the question has a correct answer, or that it does not know the answer. It also clearly does not understand that I hate bullshit, no matter how many dozens of times I prompt it to not make something up and would prefer an admittance of ignorance.
It didn't understand you but the response was plausible enough to require fact checking.
Although that isn't literally indistinguishable from 'understanding' (because your fact checking easily discerned that) it suggests that at a surface level it did appear to understand your question and knew what a plausible answer might look like. This is not necessarily useful but it's quite impressive.
There are times it just generates complete nonsense that has nothing to do with what I said, but it's certainly not most of the time. I do not know how often, but I'd say it's definitely under 10% and almost certainly under 5% that the above happens.
Sure, LLMs are incredibly impressive from a technical standpoint. But they're so fucking stupid I hate using them.
> This is not necessarily useful but it's quite impressive.
For them to work at all they need to have some representation of concepts. Recent research at anthropic has shown a surprising complexity in their reasoning behavior. Perhaps the parrot here is you.
It's the first time I've ever used that phrase on HN. Anyway, what phrase do you think works better than 'stochastic parrot' to describe how LLMs function?
It’s good rhetoric but bad analogy. LLMs can be very creative (to the point of failure, in hallucinations).
I don’t know if there is a pithy shirt phrase to accurately describe how LLMs function. Can you give me a similar one for how humans think? That might spur my own creativity here.
Try to come up with a way to prove humans aren't stochastic parrots then maybe people will atart taking you seriously. Just childish reddit angst rn nothing else.
I hate to be the burden of proof guy, but in this case I'll say: the burden of proof is on you to prove that humans are stochastic parrots. For millenia, nobody thought to assert that the human brain was computational in nature, until people invented computers, and all of a sudden started asserting that many the human brain was just like a classical computer.
Of course, this turned out to be completely false, with advances in understanding of neural networks. Now, again with no evidence other than "we invented this thing that's, useful to us" people have been asserting that humans are just like this thing we invented. Why? What's the evidence? There never is any. It's high dorm room behavior. "What if we're all just machines, man???" And the argument is always that if I disagree with you when you assert this, then I am acting unscientifically and arguing for some kind of magic.
But there's no magic. The human brain just functions in a way different than the new shiny toys that humans have invented, in terms of ability to model an external world, in terms of the way emotions and sense experience are inseparable from our capacity to process information, in terms of consciousness. The hardware is entirely different, and we're functionally different.
The closest things to human minds are out there, and they've been out there for as long as we have: other animals. The real unscientific perspective is that to get high on your own supply and assert that some kind of fake, creepily ingratiating Spock we made up (who is far less charming than Leonard Nimony) is more like us than a chimp is.
> Try to come up with a way to prove humans aren't stochastic parrots
Look around you
Look at Skyscrapers. Rocket ships. Agriculture.
If you want to make a claim that humans are nothing more than stochastic parrots then you need to explain where all of this came from. What were we parroting?
Meanwhile all that LLMs do is parrot things that humans created
Skyscrapers: trees, mountains, cliffs, caves in mountainsides, termite mounds, humans knew things could go high, the Colosseum was built two thousand years ago as a huge multi-storey building.
Rocket ships: volcanic eruptions show heat and explosive outbursts can fling things high, gunpowder and cannons, bellows showing air moves things.
Agriculture: forests, plains, jungle, desert oases, humans knew plants grew from seeds, grew with rain, grew near water, and grew where animals trampled them into the ground.
We need a list of all atempted ideas, all inventions and patents that were ever tried or conceived, and then we see how inventions are the same random permutations on ideas with Darwinian style survivorship as everything else; there were steel boats with multiple levels in them before skyscrapers; is the idea of a tall steel building really so magical when there were over a billion people on Earth in 1800 who could have come up with it?
You’re likening actual rocketry to LLMs being mildly successful at describing Paul Newman’s alcohol use on average when they already have the entire internet handed to them.
> when there were over a billion people on Earth in 1800 who could have come up with it
My point is that humans did come up with it. Humans did not parrot it from someone or something else that showed it to us. We didn't "parrot" splitting the atom. We didn't learn how to build skyscrapers from looking at termite hills and we didn't learn to build rockets that can send a person to the moon from seeing a volcano
It's obvious that humans imitate concepts and don't come up with things de-novo from a blank slate of pure intelligence. So your claim hinges on LLMs parrotting the words they are trained on. But they don't do that, their training makes them abstract over concepts and remix them in new ways to output sentences they weren't trained on, e.g.:
Prompt: "Can you give me a URL with some novel components, please?"
An living parrot echoing "pieces of eight" cannot do this, it cannot say "pieces of <currency>" or "pieces of <valuable mineral>" even if asked to do that. The LLM training has abstracted some concept of what it means for a text pattern to be a URL and what it means for things to be "novel" and what it means to switch out the components of a URL but keep them individually valid. It can also give a reasonable answer asking for a new kind of protocol. So your position hinges on the word "stochastic" which is used as a slur to mean "the LLM isn't innovating like we do it's just a dice roll of remixing parts it was taught". But if you are arguing that makes it a "stochastic parrot" then you need to consider splitting the atom in its wider context...
> "We didn't "parrot" splitting the atom"
That's because we didn't "split the atom" in one blank-slate experiment with no surrounding context. Rutherford and team disintegrated the atom in 1914-1919 ish, they were building on the surrounding scientific work happening at that time: 1869 Johann Hittorf recognising that there was something coming in a straight line from or near the cathode of a Crookes vacuum tube, 1876 Eugen Goldstein proving they were coming from the cathode and naming them cathode rays (see: Cathode Ray Tube computer monitors), and 1897 J.J Thompson proving the rays are much lighter than the lightest known element and naming them Electrons, the first proof of sub-atomic particles existing. He proposed the model of the atom as a 'plum pudding' (concept parroting). Hey guess who JJ Thomspon was an academic advisor of? Ernest Rutherford! 1911 Rutherford discovery of the atomic nucleus. 1909 Rutherford demonstrated sub-atomic scattering and Millikan determined the charge on an electron. Eugen Goldstein also discovered the anode rays travelling the other way in the Crookes tube and that was picked up by Wilhelm Wien and it became Mass Spectrometry for identifying elements. In 1887 Heinrich Hertz was investigating the Photoelectric effect building on the work of Alexandre Becquerel, Johann Elster, Hans Geitel. Dalton's atomic theory of 1803.
Not to mention Rutherford's 1899 studies of radioactivity, following Henri Becquerel's work on Uranium, following Marie Curie's work on Radium and her suggestion of radioactivity being atoms breaking up, and Rutherford's student Frederick Soddy and his work on Radon, and Paul Villard's work on Gamma Ray emissions from Radon.
When Philipp Lenard was studying cathode rays in the 1890s he bought up all the supply of one phosphorescent material which meant Röntgen had to buy a different one to reproduce the results and bought one which responded to X-Rays as well, and that's how he discovered them - not by pure blank-sheet intelligence but by probability and randomness applied to an earlier concept.
That is, nobody taught humans to split the atom and then humans literally parotted the mechanism and did it, but you attempting to present splitting the atom as a thing which appeared out of nowhere and not remixing any existing concepts is, in your terms, absolute drivel. Literally a hundred years and more of scientists and engineers investigating the subatomic world and proposing that atoms could be split, and trying to work out what's in them by small varyations on the ideas and equipment and experiments seen before, you can just find names and names and names on Wikipedia of people working on this stuff and being inspired by others' work and remixing the concepts in it, and we all know the 'science progresses one death at a time' idea that individual people pick up what they learned and stick with it until they die, and new ideas and progress need new people to do variations on the ideas which exist.
No people didn't learn to build rockets from "seeing a volcano" but if you think there was no inspiration from fireworks, cannons, jellyfish squeezing water out to accelerate, no sudies of orbits from moons and planets, no chemistry experiments, no inspiration from thousands of years of flamethrowers: https://en.wikipedia.org/wiki/Flamethrower#History no seeing explosions moving large things, you're living in a dream
> fireworks, cannons, jellyfish squeezing water out to accelerate, no sudies of orbits from moons and planets, no chemistry experiments, no inspiration from thousands of years of flamethrowers
Fireworks, cannons, chemistry experiments and flamethrowers are all human inventions
And yes, exactly! We studied orbits of moons and planets. We studied animals like Jellyfish. We choose to observe the world, we extracted data, we experimented, we saw what worked, refined, improved, and succeeded
LLMs are not capable of observing anything. They can only regurgitate and remix the information they are fed by humans! By us, because we can observe
An LLM trained on 100% wrong information will always return wrong information for anything you ask it.
Say you train an LLM with the knowledge that fire can burn underwater. It "thinks" that the step by step instructions for building a fire is to pile wood and then pour water on the wood. It has no conflicting information in its model. It cannot go try to build a fire this way and observe that it is wrong. It is a parrot. It repeats the information that you give it. At best it can find some relationships between data points that humans haven't realized might
be related
A human could easily go attempt this, realize it doesn't work, and learn from the experience. Humans are not simply parrots. We are capable of exploring our surroundings and internalizing things without needing someone else to tell us how everything works
> That is, nobody taught humans to split the atom and then humans literally parotted the mechanism and did it, but you attempting to present splitting the atom as a thing which appeared out of nowhere and not remixing any existing concepts is, in your terms, absolute drivel
Building on the work of other humans is not parroting
You outlined the absolute genius of humanity building from first principles all the way to splitting the atom and you still think we're just parroting,
An ability to answer questions with a train of thought showing how the answer was derived, or the self-awareness to recognize you do not have the ability to answer the question and declare as much. More than half the time I've used LLMs they will simply make answers up, and when I point out the answer is wrong it simply regurgitates another incorrect answer ad nauseum (regularly cycling through answers I've already pointed out are incorrect).
Rather than give you a technical answer - if I ever feel like an LLM can recognize its limitations rather than make something up, I would say it understands. In my experience LLMs are just algorithmic bullshitters. I would consider a function that just returns "I do not understand" to be an improvement, since most of the time I get confidently incorrect answers instead.
Yes, I read Anthropic's paper from a few days ago. I remain unimpressed until talking to an LLM isn't a profoundly frustrating experience.
> I guess the problem with LLMs is that they're too usable for their own good, so people don't realizing that they can't perfectly know all the trivia in the world, exactly the same as any human.
They're quite literally being sold as a replacement for human intellectual labor by people that have received uncountable sums of investment money towards that goal.
The author of the post even says this:
"These machines will soon become the beating hearts of the society in which we live. The social and political structures they create as they compose and interact with each other will define everything we see around us."
Can't blame people "fact checking" something that's supposed to fill these shoes.
People should be (far) more critical of LLMs given all of these style of bold claims, not less.
Also, telling people they're "holding it wrong" when they interact with alleged "Ay Gee Eye" "superintelligence" really is a poor selling point, and no way to increase confidence in these offerings.
These people and these companies don't get to make these claims that threaten the livelihood of millions of people, inflate a massive bubble, impact hiring decisions and everything else we've seen and then get excused cause "whoops you're not supposed to use it like that, dummy."
Your point is still trivially disproven by the fact that not even humans are expected to know all the world's trivia off the top of their heads.
We can discuss whether LLMs live up to the hype, or we can discuss how to use this new tool in the best way. I'm really tired of HN insisting on discussing the former, and I don't want to take part in that. I'm happy to discuss the latter, though.
> Put an LLM on top of a search engine, and that's the appropriate tool for this use case.
Hm nope, now that the web if flooded by LLM generated content it's game over. I can't tell how many times I almost got fooled by recipes &co which seem legit at first but are utter non sense. And now we're feeding that garbage back to where it came from
Ironically though an LLM powered search engine (some word about being perplexed) is becoming way better than the undisputed king of traditional search engines (something oogle)
It expands what they had before with AI Overviews, but I’m not sure how new either of those are. It showed up for me organically as an AI Mode tab on a native Google search in Firefox ironically.
It asks me to change some permissions, but that help page says this is only available in the US, so I suppose I'll get blocked right after I change them.
Ah, but isn’t that the problem here - asking an LLM for facts without requesting a search is like asking a PhD to answer a question “off the top of your head”. For pop culture questions the PhD likely brings little value.
They should know better than to guess. Educated, honest, intelligent people don't spout off a wild ass guess, if they don't know something they say so.
I don't think they mean "knowledge" when they talk about "intelligence." LLMs are definitely not knowledge bases. They can transform information given to them in impressive ways, but asking a raw (non-RAG-enabled) LLM to provide its own information will probably always be a mistake.
They kind of are knowledge bases, just not in the usual way. The knowledge is encoded in the words they were trained on. They weren't trained on words chosen at random; they were trained on words written by humans to encode some information. In fact, that's the only thing that makes LLMs somewhat useful.
I can’t reproduce. Maybe others reported the error and someone adjusted the expected answer, I do not know enough about OpenAI operations to say for sure.
The reason this bothers me is that comments like this reinforce the believes of people that could otherwise find value in these tools.
But I think points like this would be better made in shared chats or screenshots, since we do not have something like a core dump or stacktrace to attach.
And while I am not saying OP did this, I have seen technically skilled engineers asserting/implying that llm/chatbots aren’t good or not useful to them look at their chat log that a multitude of topics that I am sure would impact the result of the query.
Yes. It can be an UX problem. Yes. It can be an algorithmc problem.
But they are just tools that can be used wrong and not a perfect mechanical brain.
Yes, Paul Newman struggled with alcohol. His issues with alcohol were explored in the HBO Max documentary, The Last Movie Stars, and Shawn Levy's biography, Paul Newman: A Life. According to a posthumous memoir, Newman was tormented by self-doubt and insecurities and questioned his acting ability. His struggles with alcohol led to a brief separation from Joanne Woodward, though it had nothing to do with cheating.
(4x Source footnotes omitted for readability)
# Ki Multi-step Research Assistant
Paul Newman is known to have struggled with alcohol. According to his posthumous memoir, Newman candidly discussed his issues with drinking and self-doubt, describing himself as an alcoholic who was tormented by insecurities[^1][^2]. He reportedly drank a significant amount of beer daily and later moved on to stronger drinks like Scotch[^3][^4]. His drinking habits were a notable part of his life, and he was often identified by his beer drinking[^5][^6]. Despite these struggles, Newman was also recognized for his generosity and devotion to his family[^7].
> "According to a posthumous memoir, Newman was tormented by self-doubt and insecurities and questioned his acting ability. His struggles with alcohol led to a brief separation from Joanne Woodward, though it had nothing to do with cheating."
'though it had nothing to do with cheating' is a weird inclusion.
You may know this, but that doesn't mean we all do. LLMs are marketed towards many different groups, including folks who don't have much tech knowledge.
So, in other words, are you saying that AI model progress is the real deal and is not bullshit?
That is, as you point out, "all of the models up to o3-mini-high" give an incorrect answer, while other comments say that OpenAIs later models give correct answers, with web citations. So it would seem to follow that "recent AI model progress" actually made a verifiable improvement in this case.
I am pretty sure that they must have meant "up through", not "up to", as the answer from o3-mini-high is also wrong in a way which seems to fit the same description, no?
I tried with 4o and it gave me what I thought was a correct answer:
> Paul Newman was not publicly known for having major problems with alcohol in the way some other celebrities have been. However, he was open about enjoying drinking, particularly beer. He even co-founded a line of food products (Newman’s Own) where profits go to charity, and he once joked that he consumed a lot of the product himself — including beer when it was briefly offered.
> In his later years, Newman did reflect on how he had changed from being more of a heavy drinker in his youth, particularly during his time in the Navy and early acting career, to moderating his habits. But there’s no strong public record of alcohol abuse or addiction problems that significantly affected his career or personal life.
> So while he liked to drink and sometimes joked about it, Paul Newman isn't generally considered someone who had problems with alcohol in the serious sense.
As other's have noted, LLMs are much more likely to be cautious in providing information that could be construed as libel. While Paul Newman may have been an alcoholic, I couldn't find any articles about it being "public" in the same way as others, e.g. with admitted rehab stays.
Unless you're expecting an LLM to have access to literally all information on earth at all times I find it really hard to care about this particular type of complaint.
My calculator can't conjugate German verbs. That's fine IMO. It's just a tool
Yes but a tool for what? When asked a question individuals that don't already have detailed knowledge of a topic are left with no way to tell if the AI generated response is complete bullshit, uselessly superficial, or detailed and on point. The only way to be sure is to then go do the standard search engine grovel looking for authoritative sources.
"Is Paul Newman known for having had problems with alcohol?"
All of the models up to o3-mini-high told me he had no known problems. Here's o3-mini-high's response:
"Paul Newman is not widely known for having had problems with alcohol. While he portrayed characters who sometimes dealt with personal struggles on screen, his personal life and public image were more focused on his celebrated acting career, philanthropic work, and passion for auto racing rather than any issues with alcohol. There is no substantial or widely reported evidence in reputable biographies or interviews that indicates he struggled with alcohol abuse."
There is plenty of evidence online that he struggled a lot with alcohol, including testimony from his long-time wife Joanne Woodward.
I sent my mom the ChatGPT reply and in five minutes she found an authoritative source to back her argument [1].
I use ChatGPT for many tasks every day, but I couldn't fathom that it would get so wrong something so simple.
Lesson(s) learned... Including not doubting my mother's movie trivia knowledge.
[1] https://www.newyorker.com/magazine/2022/10/24/who-paul-newma...