I get so confused on this. I play around, test, and mess with LLMs all the time and they are miraculous. Just amazing, doing things we dreamed about for decades. I mean, I can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out. It talks to me like a person. It generates really cool images. It helps me write code. And just tons of other stuff that astounds me.
And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly? This is the most amazing technology I've experienced as a 50+ year old nerd that has been sitting deep in tech for basically my whole life. This is the stuff of science fiction, and while there totally are limitations, the speed at which it is progressing is insane. And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"
The technology is not just less than superintelligence, for many applications it is less than prior forms of intelligence like traditional search and Stack Exchange, which were easily accessible 3 years ago and are in the process of being displaced by LLMs. I find that outcome unimpressive.
And this Tweeter's complaints do not sound like a demand for superintelligence. They sound like a demand for something far more basic than the hype has been promising for years now.
- "They continue to fabricate links, references, and quotes, like they did from day one."
- "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error." (Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.)
- "They reference a scientific publication, I look it up, it doesn't exist."
- "I have tried Gemini, and actually it was even worse in that it frequently refuses to even search for a source and instead gives me instructions for how to do it myself."
- "I also use them for quick estimates for orders of magnitude and they get them wrong all the time. "
- "Yesterday I uploaded a paper to GPT to ask it to write a summary and it told me the paper is from 2023, when the header of the PDF clearly says it's from 2025. "
A municipality in Norway used LLM to create a report about the school structure in the municipality (how many schools are there, how many should there be, where should they be, how big should they be, pros and cons of different size schools and classes etc etc). Turns out the LLM invented scientific papers to use as references and the whole report is complete and utter garbage based on hallucinations.
I agree. I use LLMs heavily for gruntwork development tasks (porting shell scripts to Ansible is an example of something I just applied them to). For these purposes, it works well. LLMs excel in situations where you need repetitive, simple adjustments on a large scale. IE: swap every postgres insert query, with the corresponding mysql insert query.
A lot of the "LLMs are worthless" talk I see tends to follow this pattern:
1. Someone gets an idea, like feeding papers into an LLM, and asks it to do something beyond its scope and proper use-case.
2. The LLM, predictably, fails.
3. Users declare not that they misused the tool, but that the tool itself is fundamentally corrupted.
It in my mind is no different to the steam roller being invented, and people remaking how well it flattens asphalt. Then a vocal group trying to use this flattening device to iron clothing in bulk, and declaring steamrollers useless when it fails at this task.
>swap every postgres insert query, with the corresponding mysql insert query.
If the data and relationships in those insert queries matter, at some unknown future date you may find yourself cursing your choice to use an LLM for this task. On the other hand you might not ever find out and just experience a faint sense of unease as to why your customers have quietly dropped your product.
I’ve already seen people completely mess things up. It’s hilarious. Someone who thinks they’re in “founder mode” and a “software engineer” because chatgpt or their cursor vomited out 800 lines of python code.
The vileness of hoping people suffer aside, anyone who doesn’t have adequate testing in place is going to fail regardless of whether bad code is written by LLMs or Real True Super Developers.
What vileness? These are people who are gleefully sidestepping things they don't understand and putting tech debt onto others.
I'd say maybe up to 5-10 years ago, there was an attitude of learning something to gain mastery of it.
Today, it seems like people want to skip levels which eventually leads to catastrophic failure. Might as well accelerate it so we can all collectively snap out of it.
The mentality you're replying to confuses me. Yes, people can mess things up pretty badly with AI. But I genuinely don't understand why the assumption that anyone using AI is also not doing basic testing, or code review.
Right, which is why you go back and validate code. I'm not sure why the automatic assumption that implementing AI in a workflow means you blindly accept the outputs. You run the tool, you validate the output, and you correct the output. This has been the process with every new engineering tool. I'm not sure why people assume first that AI is different, and second that people who use it are all operating like the lowest common denominator AI slop-shop.
In this analogy are all the steamroller manufacturers loudly proclaiming how well it 10x the process of bulk ironing clothes?
And is a credulous executive class en masse buying into that steam roller industry marketing and the demos of a cadre of influencer vibe ironers who’ve never had to think about the longer term impacts of steam rolling clothes?
Thank you for mentioning that! What a great example of something an LLM can pretty well do that otherwise can take a lot of time looking up Ansible docs to figure out the best way to do things. I'm guessing the outputs aren't as good as someone real familiar with Ansible could do, but it's a great place to start! It's such a good idea that it seems obvious in hindsight now :-)
Exactly, yeah. And once you look over the Ansible, it's a good place to start and expand. I'll often have it emit hemlcharts for me as templates, then after the tedious setup of the helm chart is done, the rest of it is me manually doing the complex parts, and customizing in depth.
Plus, it's a generic question; "give a helm chart for velero that does x y and z" is as proprietary as me doing a Google search for the same, so you're not giving proprietary source code to OpenAI/wherever so that's one fewer thing to worry about.
Yeah, I tend to agree. The main reason that I use AI for this sort of stuff is it also gives me something complete that I can then ask questions about, and refine myself. Rather than the fragmented documentation style "this specific line does this" without putting it in the context of the whole picture of a completed sample.
I'm not sure if it's a facet of my ADHD, or mild dyslexia, but I find reading documentation very hard. It's actually a wonder I've managed to learn as much as I have, given how hard it is for me to parse large amounts of text on a screen.
Having the ability to interact with a conversational type documentation system, then bullshit check it against the docs after is a game changer for me.
that's another thing! people are all "just read the documentation". the documentation goes on and on about irrelevant details, how do people not see the difference between "do x with library" -> "code that does x", and having to read a bunch of documentation to make a snippet of code that does the same x?
I'm not sure I follow what you mean, but in general yes. I do find "just read the docs" to be a way to excuse not helping team members. Often docs are not great, and tribal knowledge is needed. If you're in a situation where you're either working on your own and have no access to that, or in a situation where you're limited by the team member's willingness to share, then AI is an OK alternative within limits.
Then there's also the issue that examples in documentation are often very contrived, and sometimes more confusing. So there's value in "work up this to do such and such an operation" sometimes. Then you can interrogate the functionality better.
No, it says that people dislike liars. If you are known for making up things constantly, you might have a harder time gaining trust, even if you're right this time.
1. LLMs have been massively overhyped, including by some of the major players.
2. LLMs have significant problems and limitations.
3. LLMs can do some incredibly impressive things and can be profoundly useful for some applications.
I would go so far as to say that #2 and #3 are hardly even debatable at this point. Everyone acknowledges #2, and the only people I see denying #3 are people who either haven't investigated or are so annoyed by #1 that they're willing to sacrifice their credibility as an intellectually honest observer.
#3 can be true and yet not be enough to make your case. Many failed technologies achieved impressive engineering milestones. Even the harshest critic could probably brainstorm some niche applications for a hallucination machine or whatever.
It says that people need training on what the appropriate use-cases for LLMs are.
This is not the type of report I'd use an LLM to generate. I'd use a database or spreadsheet.
Blindly using and trusting LLMs is a massive minefield that users really don't take seriously. These mistakes are amusing, but eventually someone is going to use an LLM for something important and hallucinations are going to be deadly. Imagine a pilot or pharmacist using an LLM to make decisions.
Some information needs to come from authoritative sources in an unmodified format.
It only makes it worthless for implementations where you require data. There's a universe of LLM use cases that aren't asking ChatGPT to write a report or using it as a Google replacement.
The problem is that yes llms are great when working on some regular thing for the first time. You can get started at a speed never before seen in the tech world.
But as soon as your use case goes beyond that LLMs are almost useless.
The main complaint that yes its extremely helpful in that specific subset of problems, it’s not actually pushing human knowledge forward. Nothing novel is being created with it.
It has created this illusion of being extremely helpful when in reality it is a shallow kind of help.
> If it makes data up, then it is worthless for all implementations.
Not true. It's only worthless for the things you can't easily verify. If you have a test for a function and ask an LLM to generate the function, it's very easy to say whether it succeeded or not.
In some cases, just being able to generate the function with the right types will mostly mean the LLM's solution is correct. Want a `List(Maybe a) -> Maybe(List(a))`? There's a very good chance a LLM will either write the right function or fail the type check.
In a research context, it provides pointers, and keywords for further investigation. In a report-writing context it provides textual content.
Neither of these or the thousand other uses are worthless. Its when you expect working and complete work product that it's (subjectively, maybe) worthless but frankly aiming for that with current gen technology is a fool's errand.
It mostly says that one of the seriously difficult challenges with LLMs is a meta-challenge:
* LLMs are dangerously useless for certain domains.
* ... but can be quite useful for others.
* The real problem is: They make it real tricky to tell, because most of all they are trained to sound professional and authoritative. They hallucinate papers because that's what authoritative answers look like.
That already means I think LLMs are far less useful than they appear to be. It doesn't matter how amazing a technology is: If it has failure modes and it is very difficult to know what they are, it's dangerous technology no matter how awesome it is when it is working well. It's far simpler to deal with tech that has failure modes but you know about them / once things start failing it's easy to notice.
Add to it the incessant hype, and, oh boy. I am not at all surprised that LLMs have a ridiculously wide range as to detractors/supporters. Supporters of it hype the everloving fuck out of it, and that hype can easily seem justified due to how LLMs can produce conversational, authoritative sounding answers that are explicitly designed to make your human brain go: Wow, this is a great answer!
... but experts read it and can see the problems there. Which lots of tech suffers from: as a random example: Plenty of highly upvoted apparently fantastically written Stack Overflow answers have problems. For example, it's a great answer... for 10 years ago; it is a bad idea today because the answer has been obsoleted.
But between the fact that it's overhyped and particularly complex to determine an LLM answer is hallucinated drivel, it's logical to me that experts are hyperbolic when highlighting the problems. That's a natural reaction when you have a thing that SEEMS amazing but actually isn't.
You, and the OP, are being unfair in your replies. Obviously, it's not worthless for all applications but when LLMs obviously fail in disastrous ways in some important areas, you can't refute that by going "actually it gives me codign advice and generates images".
Thats nice and impressive, but there are still important issues and shortcomings. Obligatory, semirelated xkcd: https://xkcd.com/937/
All of these anecdotal stories about "LLM" failures need to go into more detail about what model, prompt, and scaffolding was used. It makes a huge difference. Were they using Deep Research, which searches for relevant articles and brings facts from them into the report? Or did they type a few sentences into ChatGPT Free and blindly take it on faith?
LLMs are _tools_, not oracles. They require thought and skill to use, and not every LLM is fungible with every other one, just like flathead, Phillips, and hex-head screwdrivers aren't freely interchangeable.
If any non-trivial ask of an LLM also requires the prompts/scaffolding to be listed, and independently verified, along with its output, their utility is severely diminished. They should be saving time not giving us extra homework.
That isn't what I'm saying. I'm saying you can't make a blanket statement that LLMs in general aren't fit for some particular task. There are certainly tasks where no LLM is competent, but for others, some LLMs might be suitable while others are not. At least some level of detail beyond "they used an LLM" is required to know whether a) there was user error involved, or b) an inappropriate tool was chosen.
Are they? Every foundation model release includes benchmarks with different levels of performance in different task domains. I don't think I've seen any model advertised by its creating org as either perfect or even equally competent across all domains.
The secondary market snake oil salesmen <cough>Manus</cough>? That's another matter entirely and a very high degree of skepticism for their claims is certainly warranted. But that's not different than many other huckster-saturated domains.
People like Zuckerberg go around claiming most of their code will be written by AI starting sometime this year. Other companies are hearing that and using it as a reason(or false cover) for layoffs. The reality is LLMs still have a way to go before replacing experienced devs and even when they start getting there there will be a period of time where we’re learning what we can and can’t trust them with and how to use them effectively and responsibly. Feels like at least a few years from now, but the marketing says it’s now.
In many, many cases those problems are resolved by improvements to the model. The point is that making a big deal about LLM fuck ups in 3 year old models that don't reproduce in new ones is a complete waste of time and just spreads FUD.
Did you read the original tweet? She mentions the models and gives high level versions of her prompts. I'm not sure what "scaffolding" is.
You're right that they're tools, but I think the complaint here is that they're bad tools, much worse than they are hyped to be, to the point that they actually make you less efficient because you have to do more legwork to verify what they're saying. And I'm not sure that "prompt training," which is what I think you're suggesting, is an answer.
I had several bad experiences lately. With Claude 3.7 I asked how to restore a running database in AWS to a snapshot (RDS, if anyone cares). It basically said "Sure, just go to the db in the AWS console and select 'Restore from snapshot' in the actions menu." There was no such button. I later read AWS docs that said you cannot restore a running database to a snapshot, you have to create a new one.
I'm not sure that any amount of prompting will make me feel confident that it's finally not making stuff up.
I was responding to the "they used an LLM" story about the Norwegian school report, not the original tweet. The original tweet has a great level of detail.
I agree that hallucination is still a problem, albeit a lot less of one than it was in the recent past. If you're using LLMs for tasks where you are not directly providing it the context it needs, or where it doesn't have solid tooling to find and incorporate that context itself, that risk is increased.
Why do you think these details are important? The entire point of these tools is that I am supposed to be able to trust what they say. The hard work is precisely to be able to spot which things are true and false. If I could do that I wouldn't need an assistant.
> The entire point of these tools is that I am supposed to be able to trust what they say
Hard disagree, and I feel like this assumption might be at the root of why some people seem so down on LLMs.
They’re a tool. When they’re useful to me, they’re so useful they save me hours (sometimes days) and allow me to do things I couldn’t otherwise, and when they’re not they’re not.
It never takes me very long to figure out which scenario I’m in, but I 100% understand and accept that figuring that out is on me and part of the deal!
Sure if you think you can “vibe code” (or “vibe founder”) your way to massive success but getting LLMs to do stuff you’re clueless about without anyone way to check, you’re going to have a bad time, but the fact they can’t (so far) do that doesn’t make them worthless.
Sounds like a user problem, though. When used properly as a tool they are incredible. When you give up 100% trust to them to be perfect it’s you that is making the mistake.
Well yeah, it's fancy autocomplete. And it's extremely amazing what 'fancy autocomplete' is able to do, but making the decision to use an LLM for the type of project you described is effectively just magical thinking. That isn't an indictment against LLM, but rather the person who chose the wrong tool for the job.
Some of the more modern tools do exactly that. If you upload a CSV to Claude, it will not (or at least not anymore) try to process the whole thing. It will read the header, and then ask you what you want. It will then write the appropriate Javascript code and run it to process the data and figure out the stats/whatever you asked it for.
I recently did this with a (pretty large) exported CSV of calories/exercise data from MyFitnessPal and asked it to evaluate it against my goals/past bloodwork etc (which I have in a "Claude Project" so that it has access to all that information + info I had it condense and add to the project context from previous convos).
It wrote a script to extract out extremely relevant metrics (like ratio of macronutrients on a daily basis for example), then ran it and proceeded to talk about the result, correlating it with past context.
Use the tools properly and you will get the desired results.
Often they will do exactly that, currently their reasoning isn't the best so you may have to coax it to take the best path. It's also making a judgement call in its writing the code so worth checking too. No different to a senior instructing an intern.
"Even a journey of 1,000 miles begins with the first step. Unless you're an AI hyper then taking the first step is the entire journey - how dare you move the goalposts"
"They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error."
Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.
There are no fabricated links, references, or quotes, in OpenAI's GPT 4.5 + Deep Research.
It's unfortunate the cost of a Deep Research bespoke white paper is so high. That mode is phenomenal for pre-work domain research. You get an analyst's two week writeup in under 20 minutes, for the low cost of $200/month (though I've seen estimates that white paper cost OpenAI over USD 3000 to produce for you, which explains the monthly limits).
You still need to be a domain expert to make use of this, just as you need to be to make use of an analyst. Both the analyst and Deep Research can generate flawed writeups with similar misunderstandings: mis-synthesizing, misapplication, or missing inclusion of some essential.
Neither analyst nor LLM is a substitute for mastery.
How do people in the future become domain experts capable of properly making use of it if they are not the analyst spending two weeks on the write-up today?
My complaints with Deep Research LLMs is they don't go deeper than 2 pages of SERPs. I want them to dig down obscure stuff, not list cursorily relevant peripheral directions. they just seem to do breadth first than depth first search.
This assessment is incomplete. Large languages models are both less and more than these traditional tools. They have not subsumed them and all can sit together in separate tabs of a single browser window. They are another resource, and when the conditions are right, which is often the case in my experience, they are a startlingly effective tool for navigating the information landscape. The criticism of Gemini is a fair one, and I encountered it yesterday, but perhaps with 50% less entitlement. But Gemini also helped me translate obscure termios APIs to python from C source code I provided. The equivalent using search and/or Stack Overflow would have required multiple piecemeal searches without guarantees -- and definitely would have taken much more time.
The 404 links are hilarious, like you can't even parse the output and retry until it returns a link that doesn't 404? Even ignoring the billions in valuation, this is so bad for a $20 sub.
The tweeters complaints sound like a user problem. LLM’s are tools. How you use them, when you use them, and what you expect out of them should be based on the fact they are tools.
I’m sorry but the experience of coding with an LLM is about ten billion times better than googling and stack overflowing every single problem I come across. I’ve stack overflowed maybe like two things in the past half year and I’m so glad to not have to routinely use what is now a very broken search engine and web ecosystem.
How did you measure and compare googling/stack overflow to coding with an LLM? How did you get to the very impressive number ten billion times better?! Can you share your methodology? How have you defined better?
That's part of it. The other part is Google sacrificing product quality for excessive monetization. An example would be YouTube search - first three results are relevant, next 12 results are irrelevant "people also watched", then back to relevant results. Another example would be searching for an item to buy and getting relevant results in the images tab of google, but not the shopping tab.
It’s broken bc google has spent 20+ years promoting garbage content in a self-serving way. No one was able to compete unless they played by googles rules, and so all we have left is blog spam and regular spam
I didn't notice that example. I doubt top tier models have issues with that. I was more referencing Sabines mentions of hallucinating citations and papers which is an issue I also had 2 years ago but is probably solved by Deep Research at this point. She just has massive skill issues and doesn't know what shes doing.
>What are the use cases where the expected performance is high?
o1-pro is probably at top tier human level performance on most small coding tasks and definitely at answering STEM questions. o3 is even better but not released outside of it powering Deep Research.
> This is just not a use case where the expected performance on these tasks is high.
Yet the hucksters hyping AI are falling all over themselves saying AI can do all this stuff. This is where the centi-billion dollar valuations are coming from. It's been years and these super hyped AIs still suck at basic tasks.
When pre-AI shit Google gave wrong answers it at least linked to the source of the wrong answers. LLMs just output something that looks like a link and calls it a day.
<<After glowing reviews, I spent $200 to try it out for my research. It hallucinated 8 of 10 references on a couple of different engineeribg topics. For topics that are well established (literature search), it is useful, although o3-mini-high with web search worked even better for me. For truly frontier stuff, it is still a waste of time.>>
<<I've had the hallucination problem too, which renders it less than useful on any complex research project as far as I'm concerned.>>
These quotes are from the link you posted. There are a lot more.
The whole point is that an LLM is not a search engine and obviously anyone who treats it as one is going to be unsatisfied. It's just not a sensible comparison. You should compare working with an LLM to working with an old "state of the art" language tool like Python NLTK -- or, indeed, specifying a problem in Python versus specifying it in the form of a prompt -- to understand the unbridgeable gap between what we have today and what seemed to be the best even a few years ago. I understand when a popular science author or my relatives haven't understood this several years after mass access to LLMs, but I admit to being surprised when software developers have not.
Hosted and free or subscription-based DeepResearch like tools that integrate LLMs with search functionality (the whole domain of "RAG" or "Retrieval Augmented Generation") will be elementary for a long time yet simply because the cost of the average query starts to go up exponentially and there isn't that much money in it yet. Many people have and will continue to build their own research tools where they can determine how much compute time and API access cost they're willing to spend on a given query. OCR remains a hard problem, let alone appropriately chunking potentially hundreds of long documents into context length and synthesizing the outputs of potentially thousands of LLM outputs into a single response.
Certainly. I agree of course as to the problem of hype and I'm aware of how many people use LLMs today. I tried to emphasize in my earlier post that I can understand why someone like Sabine has the opinion she does -- I'm more confused how there's still similar positions to be found among software developers, evidenced often within Hacker News threads like the one we're in. I don't intend that to refer to you, who clearly has more than a passing knowledge of LLM internals, but more to the original commenter I was responding to.
More than marketing, I think from my experience it's chat with little control over context as the primary interface of most non-engineers with LLMs that leads to (mis)expectations of the tool in front of them. Having so little control over what is actually being input to the model makes it difficult to learn to treat a prompt as something more like a program.
It's mostly because of how they were initially marketed. In an effort to drive hype 'we' were promised the world. Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence? In reality Bard, let alone whatever early version he was using, is about as sentient as my left asscheek.
OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff. But on that same note (and also as you alluded to) - I do agree that LLMs have a lot of use as natural language search engines in spite of their proclivity to hallucinate. Being able to describe a e.g. function call (or some esoteric piece of history) by description and then often get the precise term/event that I'm looking for is just incredibly useful.
But LLMs obviously are not sentient, are not setting us on the path to AGI, or any other such nonsense. They're arguably what search engines should have been 10 or 15 years ago, but anti-competitive monopolization of the industry meant that search engine technology progress basically stalled out, if not regressed for the sake of ads (and individual 'entrepreneurs' becoming better at SEO), about the time Google fully established itself.
> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?
I presume you are referring to this Google engineer, who was sacked for making the claim. Hardly an example of AI companies overhyping the tech; precisely the opposite, in fact.
https://www.bbc.co.uk/news/technology-62275326
It seems to be a common human hallucination to imagine that large organisations are conspiring against us.
Corporations are motivated by profit, not doing what's best for humanity. If you need an example of "large organizations conspiring against us," I can give you twenty.
I agree that sometimes organisations conspire against people. My point was, in case it wasn't apparent, the irony that somenameforme was talking about how LLMs were of little use because they hallucinate, whilst apparently hallucinating a conspiracy by AI companies to overhype the technology.
I wasn't making a political point. You see similar evidence-free allegations against international organisations and national government bodies.
> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?
That's not what happened. Google stomped hard on Lemoine, saying clearly that he was wrong about LaMDA being sentient ... and then they fired him for leaking the transcripts.
Your whole argument here is based on false information and faulty logic.
Were you perchance noting that according to some people «LLMs ... can hallucinate and create illogical outputs» (you also specified «useless», but that must be a further subset and will hardly create a «litter[ing]» here), but also that some people use «false information and faulty logic»?
Noting that people are imperfect is not a justification for the weaknesses in LLMs. Since around late 2022 some people started stating LLMs are "smart like their cousin", to which the answer remains "we hope that your cousin has a proportionate employment".
If you built a crane that only lifts 15kg, it's no justification that "many people lift 10". The purpose of the crane is to lift as needed, with abundance for safety.
If we build cranes, it is because people are not sufficient: the relative weakness of people is, far from a consolation of weak cranes, the very reason why we want strong cranes. Similarly for intelligence and other qualities.
People are known to use use «false information and faulty logic»: but they are not being called "adequate".
> angry at
There's a subculture around here that thinks it normal to downvote without any rebuttal - equivalent to "sneering and leaving" (quite impolite), almost all times it leaves us without a clue about what could be the point of disapproval.
I think you're missing the point. He's pointing out what the atmosphere was/is around LLMs in these discussions, and how that impacts stories like with Lemoine.
I mean, you're right that he's silly and Google didn't want to be part of it, but it was (and is?) taken seriously that: LLMs are nascent AGI, companies are pouring money to get there first, we might be a year or two away. Take these as true, it's at least possible that Google might have something chained up in their basement.
In retrospect, Google dismissed him because he was acting in a strange and destructive way. At the time, it could be spun as just further evidence: they're silencing him because he's right. Could it have created such hysteria and silliness if the environment hadn't been so poisoned by the talk of imminent AGI/sentience?
Which comment claimed that LLMs were marketed as super-intelligence? I'm looking up the chain and I can't see it.
I don't think they were, but I think it's pretty clear they were marketed as being the imminent path to super-intelligence, or something like it. OpenAI were saying GPT-(n-1) is as intelligent as a high school student, GPT-(n) is a university student, GPT-(n+1) will be.. something.
That's the whole discussion here: "It's mostly because of how they were initially marketed. In an effort to drive hype 'we' were promised the world. Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?"
I did not miss any point and that's an ad hominem charge. He misrepresented the facts and based an argument on that misrepresentation and I pointed that out.
"In retrospect, Google dismissed him because he was acting in a strange and destructive way."
No, they dismissed him because he had released Google internal product information, "In retrospect" or otherwise.
> OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff.
The focus on safety, and the concept of "AI", preexisted the product. An LLM was just the thing they eventually made; it wasn't the thing they were hoping to make. They applied their existing beliefs to it anyway.
I am worried about them as a substitute for search engines.
My reasoning is that classic google web-scraping and SEO, as shitty as it may be, is 'open-source' (or at least, 'open-citation') in nature - you can 'inspect the sh*t it's built from'.
Whereas LLMs, to me seem like a chinese - or western - totalitarian political system wet dream - 'we can set up an inscrutable source of "truth" for the people to use, with the _truths_ we intend them to receive'.
We already saw how weird and unsane this was, when they were configured to be woke under the previous regime. Imagine it being configured for 'the other post-truth' is a nightmare.
> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?
No, first time I hear about it. I guess the secret to happiness is not following leaks. I had very low expectations before trying LLMs and I’m extremely impressed now.
Not following leaks, or just the news, not living in the real world, not caring of the consequences of reality: anybody can think he's """happy""" with psychedelia and with just living in private world. But it is the same kind of "happy" that comes with "just smile".
If you did not get information that there are severe pitfalls - which is by the way so unrelated to the "it's sentient thing", as we are talking about the faults in the products, not the faults in human fools -, you are supposed to see them from your own judgement.
They have their value in analyzing huge amounts of data for example scientific papers or raw observations, but the popular public ones are mostly trained on stolen/pirated texts offthe internet and from social media clouds the companies control. So this means: bullshit in -> bullshit out. I don't need machines for that the regular human bullshitters do this job just fine.
> the popular public ones are mostly trained on stolen/pirated texts offthe internet
You mean like actual literature, textbooks and scientific papers? You can't get them in bulk without pirating. Thank intellectual property laws.
> from social media clouds the companies control
I.e. conversations of real people about matters of real life.
But if it satisfies your elitist, ivory-towerish vision of "healthy information diet" for LLMs, then consider that e.g. Twitter is where, until now, you'd get most updates from the best minds in several scientific fields. Or that besides r/All, the Reddit dataset also contains r/AskHistorians and other subreddits where actual experts answer questions and give first-hand accounts of things.
The actually important bit though, is that LLM training manages to extract value from both the "bullshit" and whatever you'd call "not bullshit", as the model has to learn to work with natural language just as much as it has to learn hard facts or scientific theories.
Yes, I find the biggest issue in discussing the present state of AI with people outside the field, whether technical or not, is that "machine learning" had only just entered popular understanding: i.e. everyone seems ready today to talk about the limits of training a machine learning model on X limited data set, unable to extrapolate beyond it. The difference between "learning the best binary classifier on a labelled training set" and "exploring the set of all possible programs representable by a deep neural network of whatever architecture to find that which best generates all digitally recorded traces of human beings throughout history" is very far from intuitive to even specialists. I think Ilya's old public discussions of this question are the most insightful for a popular audience, explaining how and why a world model and not simply a Markov chain is necessary to solve the seemingly trivial problem of "predicting the next word in a sequence."
Nobody promised the world. The marketing underpromised and LLMs overdelivered. Safety worries didn't come from marketing, it came from people who were studying this as a mostly theoretical worry for the next 50+ years, only to see major milestones crossed a decade or more before they expected.
Did many people overhype LLMs? Yes, like with everything else (transhumanist ideas, quantum physics). It helps being more picky who one listens to, and whether they're just painting pretty pictures with words, or actually have something resembling a rational argument in there.
Folks really over index when an LLM is very good for their use case. And most of the folks here are coders, at which they're already good and getting better.
For some tasks they're still next to useless, and people who do those tasks understandably don't get the hype.
Tell a lab biologist or chemist to use an LLM to help them with their work and they'll get very little useful out of it.
Ask an attorney to use it and it's going to miss things that are blindingly obvious to the attorney.
Ask a professional researcher to use it and it won't come up with good sources.
For me, I've had a lot of those really frustrating experiences where I'm having difficulty on a topic and it gives me utter incorrect junk because there just isn't a lot already published about that data.
I've fed it tricky programming tasks and gotten back code that doesn't work, and that I can't debug because I have no idea what it's trying to do, or I'm not familiar with the libraries it used.
It sounds like you're trying to use these llms as oracles, which is going to cause you a lot of frustration. I've found almost all of them now excel at imitating a junior dev or a drunk PhD student. For example the other day I was looking at acoustic sensor data and I ran it down the trail of "what are some ways to look for repeating patterns like xyz" and 10 minutes later I had a mostly working proof of concept for a 2nd order spectrogram that reasonably dealt with spectral leakage and a half working mel spectrum fingerprint idea. Those are all things I was thinking about myself, so I was able to guide it to a mostly working prototype in very little time. But doing it myself from zero would've taken at least a couple of hours.
But truthfully 90% of work related programming is not problem solving, it's implementing business logic. And dealing with poor, ever changing customer specs. Which an llm will not help with.
> But truthfully 90% of work related programming is not problem solving, it's implementing business logic. And dealing with poor, ever changing customer specs. Which an llm will not help with.
Au contraire, these are exactly things LLMs are super helpful at - most of business logic in any company is just doing the same thing every other company is doing; there's not that many unique challenges in day-to-day programming (or business in general). And then, more than half of the work of "implementing business logic" is feeding data in and out, presenting it to the user, and a bunch of other things that boil down to gluing together preexisting components and frameworks - again, a kind of work that LLMs are quite a big time-saver for, if you use them right.
Strongly in agreement. I've tried them and mostly come away unimpressed. If you work in a field where you have to get things right, and it's more work to double check and then fix everything done by the LLM, they're worse than useless. Sure, I've seen a few cases where they have value, but they're not much of my job. Cool is not the same as valuable.
If you think "it can't quite do what I need, I'll wait a little longer until it can" you may still be waiting 50 years from now.
> If you work in a field where you have to get things right, and it's more work to double check and then fix everything done by the LLM, they're worse than useless.
Most programmers understand reading code is often harder than writing it. Especially when someone else wrote the code. I'm a bit amused by the cognitive dissonance of programmers understanding that and then praising code handed to them by an LLM.
It's not that LLMs are useless for programming (or other technical tasks) but they're very junior practitioners. Even when they get "smarter" with reasoning or more parameters their nature of confabulation means they can't be fully trusted in the way their proponents suggest we trust them.
It's not that people don't make mistakes but they often make reasonable mistakes. LLMs make unreasonable mistakes at random. There's no way to predict the distribution of their mistakes. I can learn a human junior developer sucks at memory management or something. I can ask them to improve areas they're weak in and check those areas of their work in more detail.
I have to spend a lot of time reviewing all output from LLMs because there's rarely rhyme or reason to their errors. They save me a bunch of typing but replace a lot of my savings with reviews and debugging.
My view is that it will be some time before they can as well because of the success in the software domain - not because LLM's aren't capable as a tech but because data owners and practitioners in other domains will resist the change. From the SWE experience, news reports, financial magazines, etc many are preparing accordingly, even if it is a subconscious thing. People don't like change, and don't want to be threatened when it is them at risk - no one wants what happened to artists and now SWE's to happen to their profession. They are happy for other professions to "democratize/commoditize" as long as it isn't them - after all this increases their purchasing power. Don't open source knowledge/products, don't let AI near your vertical domain, continue to command a premium for as long as you can - I've heard variations of this in many AI conversations. Much easier in oligopoly and monopoly like domains and/or domains where knowledge was known to be a moat even when mixed with software as you have more trust competitors won't do the same.
For many industries/people work is a means to earn, not something to be passionate in for its own sake. Its a means to provide for other things in life you are actually passionate about (e.g. family, lifestyle, etc). In the end AI may get your job eventually but if it gets you much later vs other industries/domains you win from a capital perspective as other goods get cheaper and you still command your pre-AI scarcity premium. This makes it easier for them to acquire more assets from the early disrupted industries and shield them from eventual AI taking over.
I'm seeing this directly in software. Less new frameworks/libraries/etc outside the AI domain being published IMO, more apprehension from companies to open source their work and/or expose what they do, etc. Attracting talent is also no longer as strong of a reason to showcase what you do to prospective employees - economic conditions and/or AI make that less necessary as well.
I frequently see news stories where attorneys get in trouble for using LLMs, because they cite hallucinated case law (e.g.). If they didn't get caught, that would look the same as using them "productively".
Asking the LLM for relevant case law and checking it up - productive use of LLM. Asking the LLM to write your argument for you and not checking it up - unproductive use of LLM. It's the same as with programming.
>Asking the LLM for relevant case law and checking it up - productive use of LLM
That's a terrible use for an LLM. There are several deterministic search engines attorneys use to find relevant case law, where you don't have to check to see if the cases actually exist after it produces results. Plus, the actual text of the case is usually very important, and isn't available if you're using an LLM.
Which isn't to say they're not useful for attorneys. I've had success getting them to do some secretarial and administrative things. But for the core of what attorneys do, they're not great.
For law firms creating their own repositories of case law, having LLMs search via summaries, and then dive into the selected cases to extract pertinent information seems like an obvious great use case to build a solution using LLMs.
The orchestration of LLms that will be reading transcripts, reading emails, reading case law, and preparing briefs with sources is unavoidable in the next 3 years. I don’t doubt multiple industry specialized solutions are already under development.
Just asking chatGPT to make your case for you is missing the opportunity.
If anyone is unable to get Claud 3.7 or Gemini 2.5 to accelerate their development work I have to doubt their sentience at this point. (Or more likely doubt that they’re actively testing these things regularly)
Law firms don't create their own repos of case law. They use a database like westlaw or lexis. LLMs "preparing briefs with sources" would be a disaster and wholly misunderstands what legal writing entails.
I find it very useful to review the output and consider its suggestions.
I don’t trust it blindly, and I often don’t use most of what it suggests; but I do apply critical thinking to evaluate what might be useful.
The simplest example is using it as a reverse dictionary. If I know there’s a word for a concept, I’ll ask an LLM. When I read the response, I either recognize the word or verify it using a regular dictionary.
I think a lot of the contention in these discussions is because people are using it for different purposes: it's unreliable for some purposes and it is excellent at others.
> Asking the LLM for relevant case law and checking it up - productive use of LLM.
Only if you're okay with it missing stuff. If I hired a lawyer, and they used a magic robot rather than doing proper research, and thus missed relevant information, and this later came to light, I'd be going after them for malpractice, tbh.
Surely this was meant ironically, right? You must've heard of at least one of the many cases involving lawyers doing precisely what you described and ending up presenting made up legal cases in court. Guess how that worked out for them.
The uses that they cited to me were "additional pair of eyes in reviewing contracts," and, "deep research to get started on providing a detailed overview of a legal topic."
Honestly it's worse than this. A good lab biologist/chemist will try to use it, understand that it's useless, and stop using it. A bad lab biologist/chemist will try to use it, think that it's useful, and then it will make them useless by giving them wrong information. So it's not just that people over-index when it is useful, they also over-index when it's actively harmful but they think it's useful.
You think good biologists never need to summarize work into digestible language, or fill out multiple huge, redundant grant applications with the same info, or reformat data, or check that a writeup accurate reflects data?
I’m not a biologist (good or bad) but the scientists I know (who I think are good) often complain that most of the work is drudgery unrelated to the science they love.
Sure, lots of drudgery, but none of your examples are things that you could trust an LLM to do correctly when correctness counts. And correctness always counts in science.
Edit to add: and regardless, I'm less interested in the "LLM's aren't ever useful to science" part of the point. The point that actual LLM usage in science will mostly be for cases where they seem useful but actually introduce subtle problems is much more important. I have observed this happening with trainees.
The problem Sabine tries to communicate is that reality is different from what the cash-heads behind main commercial models are trying to portray. They push the narrative that they’ve created something akin to human cognition, when in reality, they’ve just optimised prediction algorithms on an unprecedented scale. They are trying to say that they created Intelligence, which is the ability to acquire and apply knowledge and skills, but we all know the only real Intelligence they are creating is the collection of information of military or political value.
The technology is indeed amazing and very amusing, but like all the good things in the hands of corporate overlords, it will be slowly turning into profit-milking abomination.
> They push the narrative that they’ve created something akin to human cognition
This is your interpretation of what these companies are saying. I'd love to see if some company specifically anything like that?
Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now? How many things from today when brought back into 2010 would make the person using it make it feel like they're being tricked or pranked? We already take them for granted even thought they've only been around for less than half of a decade.
LLMs aren't a catch all solution to the world's problems; or something that is going to help us in every facet of our lives; or an accelerator for every industry that exists out there. But at no point in history could you talk to your phone about general topics, get information, practice language skills, build an assistant that teaches your kid about the basics of science, use something to accelerate your work in a many different ways etc...
Looking at llms shouldn't be boolean, it shouldn't be between they're the best thing ever invented vs they're useless; but it seems like everyone presents the issue in this manner and Sabine is part of that problem.
No major company directly states "We have created human-like intelligence," they intentionally use suggestive language that leads people to think AI is approaching human cognition. This helps with hype, investment, and PR.
>I'd love to see if some company specifically anything like that?
1. DeepMind researchers: Sparks of Artificial General Intelligence: Early experiments with GPT-4 - https://arxiv.org/abs/2303.12712
2. "GPT-4 is not AGI, but it does exhibit more general intelligence than previous models." - Sam Altman
3. Musk has claimed that AI is on the path to "understanding the universe." His branding of Tesla's self-driving AI as "Full Self-Driving" (FSD) also misleadingly suggests a level of autonomous reasoning that doesn't exist.
4. Meta's AI chief scientist, Yann LeCun, has repeatedly said they are working on giving AI "common sense" and "world models" similar to how humans think.
>Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now?
ELIZA is an early natural language processing computer program developed from 1964 to 1967
ELIZA's creator, Weizenbaum, intended the program as a method to explore communication between humans and machines. He was surprised and shocked that some people, including Weizenbaum's secretary, attributed human-like feelings to the computer program. 60 years ago.
So as you can see, us humans are not too hard to fool with this.
ELIZA was not a natural language processor, and the fact that some people were easily fooled by a program that produced canned responses based on keywords in the text but was presented as a psychotherapist is not relevant to the issue here--it's a fallacy of affirmation of the consequent.
Also,
"4. Meta's AI chief scientist, Yann LeCun, has repeatedly said they are working on giving AI "common sense" and "world models" similar to how humans think."
completely misses the mark. That LLMs don't do this is a criticism from old-school AI researchers like Gary Marcus; LeCun is saying that they are addressing the criticism by developing the sorts of technology that Marcus says are necessary.
> they intentionally use suggestive language that leads people to think AI is approaching human cognition. This helps with hype, investment, and PR.
As do all companies in the world. If you want to buy a hammer, the company will sell it as the best hammer in the world. It's the norm.
I don't know exactly what your point is with ELIZA?
> So as you can see, us humans are not too hard to fool with this.
I mean ok? How is that related to having a 30 minute conversation with ChatGPT where it teaches you a language? Or Claude outputting an entire application in a single go? Or having them guide you through fixing your fridge by uploading the instructions? Or using NotebookLM to help you digest a scientific paper?
Im not saying LLMs are not impressive or useful — Im pointing out that corporations behind commercial AI models are capitalising on our emotional response to natural language prediction. This phenomenon isnt new – Weizenbaum observed it 60 years ago, even with the simplest of algorithms like ELIZA.
Your example actually highlights this well. AI excels at language, so it’s naturally strong in teaching (especially for language learning ;)). But coding is different. It’s not just about syntax; it requires problem-solving, debugging, and system design — areas where AI struggles because it lacks true reasoning.
There’s no denying that when AI helps you achieve or learn something new, it’s a fascinating moment — proof that we’re living in 2025, not 1967. But the more commercialised it gets, the more mythical and misleading the narrative becomes
> system design — areas where AI struggles because it lacks true reasoning.
Others addressed code, but with system design specifically - this is more of an engineering field now, in that there's established patterns, a set of components at various levels of abstraction, and a fuck ton of material about how to do it, including but not limited to everything FAANG publishes as preparatory material for their System Design interviews. At this point in time, we have both a good theoretical framework and a large collection of "design patterns" solving common problems. The need for advanced reasoning is limited, and almost no one is facing unique problems here.
I've tested it recently, and suffice it to say, Claude 3.7 Sonnet can design systems just fine - in fact much better than I'd expect a random senior engineer to. Having the breadth of knowledge and being really good at fitting patterns is a big advantage it has over people.
> They push the narrative that they’ve created something akin to human cognition
I am saying they're not doing that, they're doing sales and marketing and it's you that interprets this as possible/true. In my analogy if the company said it's a hammer that can do anything, you wouldn't use it to debug elixir. You understand what hammers are for and you realize the scope is different. Same here. It's a tool that has its uses and limits.
> Your example actually highlights this well. AI excels at language, so it’s naturally strong in teaching (especially for language learning ;)). But coding is different. It’s not just about syntax; it requires problem-solving, debugging, and system design — areas where AI struggles because it lacks true reasoning.
I disagree since I use it daily and Claude is really good at coding. It's saving me a lot of time. It's not gonna build a new Waymo but I don't expect it to. But this is besides the point. In the original tweet what Sabine is implying is that it's useless and OpenAI should be worth less than a shoe factory. When in fact this is a very poor approach to look at LLMs and their value and both sides of the spectrum are problematic (those that say it's a catch all AGI and those that say hurr it couldn't solve P versus NP it's trash).
I think one difference between a hammer and an LLM is that hammers have existed since forever, so common sense is assumed to be there as to what their purpose is. For LLMs though, people are still discovering on a daily basis to what extent they can usefully apply them, so it's much easier to take such promises made by companies out of context if you are not knowledgeable/educated on LLMs and their limitations.
Person you replied to:
they intentionally use suggestive language that leads people to think AI is approaching human cognition. This helps with hype, investment, and PR.
Your response:
As do all companies in the world. If you want to buy a hammer, the company will sell it as the best hammer in the world. It's the norm.
As a programmer (and GOFAI buff) for 60 years who was initially highly critical of the notion of LLMs being able to write code because they have no mental states, I have been amazed by the latest incarnations being able to write complex functioning code in many cases. There are, however, specific ways that not being reasoners is evident ... e.g., they tend to overengineer because they fail to understand that many situations aren't possible. I recently had an example where one node in a tree was being merged into another, resulting in the child list of the absorbed node being added to the child list of the kept node. Without explicit guidance, the LLM didn't "understand" (that is, its response did not reflect) that a child node can only have one parent so collisions weren't possible.
> proof that we’re living in 2025, not 1967. But the more commercialised it gets, the more mythical and misleading the narrative becomes
You seem to be living in 2024, or 2023. People generally have far more pragmatic expectations these days, and the companies are doing a lot less overselling ... in part because it's harder to come up with hype that exceeds the actual performance of these systems.
How many examples of CEOs writing shit like that can you name? I can name more than one. Elon's been saying that camera driven level 5 autonomous driving will be ready in 2021. Did you believe him?
Elon? Never did, and for the record, also never really understood his fanboys. I never even bought a Tesla. And no, besides these two guys, I don´t really remember many other CEOs making such revolutionary statements. That is usually the case when people understand their technology and are not ready to bullshit. There is one small differentiation though: At least self-driving cars hype was believable because it seemed almost like a finite-search problem, like along the lines of, how hard could it be to process X input signals from lidars and image frames and marry it to an advanced variation of what is basically a PID controller. And at least there is a defined use-case. With genAI, we have no idea what the problem definition and even problem space is, and the main use-case that the companies seem to be pushing down our throats (aside from code assistants) is "summarising your email" and chatting with your smartphone, for lonely people. Ew, thanks, but no thanks.
No mate, not everyone is trying hard to prove some guy on the Internet wrong. I do remember these two but to be honest, they were not on top of my mind in this context, probably because it's a different example - or what are you trying to say? That the people running AI companies should go to jail for deceiving their investors? This is different to Theranos. Holmes actively marketed and PRESENTED a "device" which did not exist as specified (they relied on 3rd party labs doing their tests in the background). For all that we know, OpenAI and their ilk are not doing that really. So you're on thin ice here. Amazon came close though, with their failed Amazon Go experiment, but they only invested their own money, so no damage was done to anyone. In either case your example is showing what? That lying is normal in the business world and should be done by the CEOs as part of their job description? That they should or should not go to jail for it? I am really missing your point here, no offence.
> In either case your example is showing what? That lying is normal in the business world and should be done by the CEOs as part of their job description? That they should or should not go to jail for it? I am really missing your point here, no offence.
If you run through the message chain you'll see first that the comment OP is claiming companies market llms as AGI, and then the next guy quotes Altmans tweet to support it. I am saying companies don't claim llms are AGI and that CEOs are doing CEO things; my examples are Elon (didn't go to jail btw) and the other two that did.
> For all that we know, OpenAI and their ilk are not doing that really.
I think you completely missed the point. Altman is definitely engaging in 'creative' messaging, so do other GenAI CEOs. But unlike Holmes and others, they are careful to wrap it into conditionals and future tense and this vague corporate speak about how something "feels" like this and that and not that it definitely is this or that. Most of us dislike the fact that they are indeed implying this stuff as being almost AGI, just around the corner, just a few more years, just a few more hundred billion dollars wasted in datacenters. When we can see on a day-to-day basis, that their tools are just advanced text generators. Anyone who finds them 'mindblowing' clearly does not have a complex enough use case.
I think you are missing the point. I never said it's the same nor is that what I am arguing.
> Anyone who finds them 'mindblowing' clearly does not have a complex enough use case.
What is the point of llms? If their only point is complex use cases then they're useless, let's throw them away. If their point/scope/application is wider and they're doing something for a non negligible percentage of people then who are you to gauge whether they deserve to be mindblowing to someone or not regardless of their use case?
What is the point of LLMs? It seems nobody really knows, including the people selling them. They are a solution in search of a problem. But if you figure it out in the meanwhile, make sure to let everyone know. Personally I'd be happy with just having back Google as it was between roughly 2006-2019 (RIP) in the place of the overly verbose statistical parrots.
> Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now?
Lots e.g. vacuum cleaners.
> But at no point in history could you talk to your phone
You could always "talk" to your phone just like you could "talk" to a parrot or a dog. What does that even mean?
If we're talking about LLMs, I still haven't been able to have a real conversation with 1. There's too much of a lag to feel like a conversation and often doesn't reply with anything related.
> If we're talking about LLMs, I still haven't been able to have a real conversation with 1. There's too much of a lag to feel like a conversation and often doesn't reply with anything related.
I don't believe this one bit. But keep on trucking.
Of course they aren't "real" conversations but I can dialog with LLMs as a means of clarifying my prompts. The comment about parrots and dogs is made in bad faith.
By your own admission, those are not dialogues, but merely query optimisations in an advanced query language. Like how you would tune an SQL query until your get the data you are expecting to see. That's what it is for the LLMs.
> The comment about parrots and dogs is made in bad faith
Not necessarily. (Some aphonic, adactyl downvoters seem to have possibly tried to nudge you into noticing that your idea above is against some entailed spirit the guidelines.)
The poster may have meant that for the use natural to him, he feels in the results the same utility of discussing with a good animal. "Clarifying one's prompts" may be effective in some cases, but it's probably not what others seek. It is possible that many want the good old combination of "informative" and "insightful": in practice there may be issues with both.
> "Clarifying one's prompts" may be effective in some cases but it's probably not what others seek
It's not even that. Can the LLM run away, stop the conversation or even say no? It's as much as your boss "talking" to you about the task and not giving you a chance to respond. Is that a talk? It's 1-way.
E.g. ask the LLM who invented Wikipedia. It will respond with "facts". If I ask a friend, the reply might be "look it up yourself". This a real conversation. Until then.
Even parrots and dogs can respond differently than a forced reply exactly how you need it.
> This is your interpretation of what these companies are saying. I'd love to see if some company specifically anything like that?
What is the layman to make of the claim that we now have “reasoning” models? Certainly sounds like a claim of human-like cognition, even though the reality is different.
Studies have shown that corvids are capable of reasoning. Does that sound like a claim of human level cognition?
I think you’re going too far in imagining what one group of people will make of what another group of people is saying, without actually putting yourself in either group.
Much as i agree with the point about overhyping from companies, I'd be more sympathetic to this point of view if she acknowledged the merits of the technology.
Yes, it hallucinates and if you replace your brain with one of these things, you won't last too long. However, it can do things which, in the hands of someone experienced, are very empowering. And it doesn't take an expert to see the potential.
As it stands, it sounds like a case of "it's great in practice but the important question is how good it is in theory."
I use LLMs. They're somewhat useful if you're on a non niche problem. They're also useful instead of search engines, but that's because search has been entshittified more than because a LLM is better.
However 90% of the marketing material about them is simply disgusting. The bigwigs sound like they're spreading a new religion, and most enthusiasts sound like they're new converts to some sect.
If you're marketing it as a tool, fine. If you're marketing it as the third and fourth coming of $DEITY, get lost.
> I use LLMs. They're somewhat useful if you're on a non niche problem. They're also useful instead of search engines...
The problem for me is that I could use that type of assistance precisely when I hit that "niche problem" zone. Non-niche problems are usually already solved.
Like search. Popular search engines like Google and Bing are mostly garbage because they keep trying to shove gen AI in my face with made up answers. I have no such problems with my SearxNG instance.
> I could use that type of assistance precisely when I hit that "niche problem" zone
Tough luck. On the other hand, we're still justified in asking for money to do the niche problems with our fleshy brains, right? In spite of the likes of Altman saying every week that we'll be obsoleted in 5 years by his products. Like ... cold fusion? Always 5 years away?
[I have more hope for cold fusion than these "AIs" though.]
> Popular search engines like Google and Bing are mostly garbage because they keep trying to shove gen AI in my face with made up answers.
No they became garbage significantly before "AI". Google at least has gradually reduced the number of results returned and expanded the search scope to the point that you want a reminder of the i2c api syntax on a raspberry pi and they return 20 beginner tutorial results that show you how to unpack the damn thing and do the first login instead.
I completely agree about the marketing material. I'm not sure about 90% but that's not something I have a strong opinion on. The stream from the bigwigs is the same song being played in a different tune and I'm inoculated to it.
I'm not marketing it. I'm not a marketer. I'm a developer trying to create an informed opinion on its utility and the marketing speak you criticize is far away from the truth.
The problem is this notion that it's just completely bullshit. The way it's worded irks me. "I genuinely don't understand...". It's quite easy to see the utility and acknowledging that doesn't, in any way, detract from valid criticisms of the technology and the people who peddle.
Exactly. It’s so strange to read so many comments that boil down to “because some marketing people are over-promising, I will retaliate by choosing to believe false things”
But it’s not the marketers building the products. This is like saying “because the car salesman lied about this Prius’ gas mileage, I’ll retaliate by refusing to believe hybrids are any better than pure ICE cars and will buy a pickup”.
It hurts nobody but the person choosing ignorance.
I hate to bring an ad hominem into this, but Sabine is a YouTube influencer now. That's her current career. So I'd assume this Tweet storm is also pushing a narrative on its own, because that's part of doing the work she chose to do to earn a living.
While true, I think this is more likely a question of framing or anchoring — I am continuously impressed and surprised by how good AI is, but I recognise all the criticisms she's making here. They're amazing, but at the same time they make very weird mistakes.
They actually remind me of myself, as I experience being a native English speaker now living in Berlin and attempting to use a language I mainly learned as an adult.
I can often appear competent in my use of the language, but then I'll do something stupid like asking someone in the office if we have a "Gabelstapler" I can borrow — Gabelstapler is "forklift truck", I meant to ask for a stapler, which is "Tacker" or "Hefter", and I somehow managed to make this mistake directly after carefully looking up the word. (Even this is a big improvement for me, as I started off like Officer Crabtree from Allo' Allo').
What you have done there is to discount statements that may build up a narrative - and still may remain fair... On which basis? Possibly they do not match your own narrative?
LLMs seem akin to parts of human cognition, maybe the initial fast thinking bit when ideas pop up in a second of two. But any human writing a review with links to sources would look them up and check the are they right ones that match the initial idea. Current LLMs don't seem to do that, at least the ones Sabine complains about.
Akin to human cognition but still a few bricks short of a load, as it were.
You lay the rhetoric on so thick (“cash heads”, “pushing the narrative”, “corporate overlords”, “profit-making abomination”) that it’s hard to understand your actual claim.
Are you trying to say that LLMs are useful now but you think that will stop being the case at some point in the future?
I was trying to say that LLMs will not become more than what they are, especially since they are tools for profit in the modern system. It’s like an iPhone — the latest iPhone 16 Pro is, of course, better than the iPhone 3G, but conceptually, there is nothing new. The same goes for LLMs. In 15 years, we will probably have the same hallucinating LLM, just 10% thinner and 20% faster.
The tech industry, especially big corporations, doesn’t chase innovation; it chases repeatable, predictable profit.
If we're calling out names, what about Roger Penrose, John Searle, Stuart Hameroff, Hubert Dreyfus, Henry Stapp? These are very intelligent people and I suggest to get acquainted with their work. Neural scaling laws are real and no matter if you put into training 10e-8 petaFLOP/days of compute or 200000 petaFLOP/days you will hit irreducible error constant at Efficient Compute Frontier.
I'm not a functionalist and my belief is that AI — especially LLMs — will never achieve real understanding or consciousness, no matter how much we scale them. Language prediction is just a computation, but human thought is more than that.
Calling out names was an argument just for not dismissing AI as a thing "everyone knows" is fake.
Above you wrote "we all know the only real Intelligence ... is" as your support for attributing venial motives to people taking AI progress seriously. OK, now I know your basis for that claim. I've read three of the guys you mention, agree they're intelligent and except for Searle have some good things to say. But it's really unconvincing as support for an AI-is-fake claim, and especially for an everyone-knows claim.
Look man, and I'm saying this not to you but to everyone who is in this boat; you've got to understand that after a while, the novelty wears off. We get it. It's miraculous that some gigabytes of matrices can possibly interpret and generate text, images, and sound. It's fascinating, it really is. Sometimes, it's borderline terrifying.
But, if you spend too much time fawning over how impressive these things are, you might forget that something being impressive doesn't translate into something being useful.
Well, are they useful? ... Yeah, of course LLMs are useful, but we need to remain somewhat grounded in reality. How useful are LLMs? Well, they can dump out a boilerplate React frontend to a CRUD API, so I can imagine it could very well be harmful to a lot of software jobs, but I hope it doesn't bruise too many egos to point out that dumping out yet another UI that does the same thing we've done 1,000,000 times before isn't exactly novel. So it's useful for some software engineering tasks. Can it debug a complex crash? So far I'm around zero for ten and believe me, I'm trying. From Claude 3.7 to Gemini 2.5, Cursor to Claude Code, it's really hard to get these things to work through a problem the way anyone above the junior dev level can. Almost unilaterally, they just keep digging themselves deeper until they eventually give up and try to null out the code so that the buggy code path doesn't execute.
So when Sabine says they're useless for interpreting scientific publications, I have zero trouble believing that. Scoring high on some shitty benchmarks whose solutions are in the training set is not akin to generalized knowledge. And these huge context windows sound impressive, but dump a moderately large document into them and it's often a challenge to get them to actually pay attention to the details that matter. The best shot you have by far is if the document you need it to reference definitely was already in the training data.
It is very cool and even useful to some degree what LLMs can do, but just scoring a few more points on some benchmarks is simply not going to fix the problems current AI architecture has. There is only one Internet, and we literally lit it on fire to try to make these models score a few more points. The sooner the market catches up to the fact that they ran out of Internet to scrape and we're still nowhere near the singularity, the better.
100% this. I think we should start producing independent evaluations of these tools for their usefulness, not for whatever made up or convoluted evaluation index the OpenAI, Google or Anthropic throw at us.
Hardly. I pretty much have been using LLM at least weekly (most of the time daily) since GPT3.5. I am still amazed. It's really, really hard to not be bullish for me.
It kinda reminds me the days I learned Unix-like command line. At least once a week, I shouted to me self: "What? There is a one-liner that does that? People use awk/sed/xargs this way??" That's how I feel about LLM so far.
I tried LLMs for generating shell snippets. Mixed bag for me. It seems to have a hard time making portable awk/sed commands. It also really overcomplicates things; you really don't need to break out awk for most simple file renaming tasks. Lesser used utilities, all bets are off.
Yesterday Gemini 2.5 Pro suggested running "ps aux | grep filename.exe" to find a Wine process (pgrep is the much better way to go for that, but it's still wrong here) and get the PID, then pass that into "winedbg --attach" which is wrong in two different ways, because there is no --attach argument and the PID you pass into winedbg needs to be the Win32 one not the UNIX one. Not an impressive showing. (I already knew how to do all of this, but I was curious if it had any insights I didn't.)
For people with less experience I can see how getting e.g. tailored FFmpeg commands generated is immensely useful. On the other hand, I spent a decent amount of effort learning how to use a lot of these tools and for most of the ways I use them it would be horrific overkill to ask an LLM for something that I don't even need to look anything up to write myself.
Will people in the future simply not learn to write CLI commands? Very possible. However, I've come to a different, related conclusion: I think that these areas where LLMs really succeed in are examples of areas where we're doing a lot of needless work and requiring too much arcane knowledge. This counts for CLI usage and web development for sure. What we actually want to do should be significantly less complex to do. The LLM actually sort of solves this problem to the extent that it works, but it's a horrible kludge solution. Literally converting video files and performing basic operations on them should not require Googling reference material and Q&A websites for fifteen minutes. We've built a vastly overly complicated computing environment and there is a real chance that the primary user of many of the interfaces will eventually not even be humans. If the interface for the computer becomes the LLM, it's mostly going to be wasted if we keep using the same crappy underlying interfaces that got us into the "how do I extract tar file" problem in the first place.
They really don’t. People say this all the time, but you give any project a little time and it evolves into a special unique snowflake every single time.
That’s why every low code solution and boilerplate generator for the last 30 years failed to deliver on the promises they made.
I agree some will evolve into more, but lots of them won't. That's why shopify, WordPress and others exist - most commercial websites are just online business cards or small shops. Designers and devs are hired to work on them all the time.
If you’re hiring a dev to work on your Shopify site, it’s most likely because you want to do something non-standard. By the time the dev gets done with it, it will be a special unique snowflake.
If your site has users, it will evolve. I’ve seen users take what was a simple trucking job posting form and repurpose an unused “trailer type” field to track the status of the job req.
Every single app that starts out as a low code/no code solution given enough time and users will evolve beyond that low code solution. They may keep using it, but they’ll move beyond being able to maintain it exclusively through a low code interface.
And most software engineering principles is for dealing how to deal with this evolution.
- Architecture (making it easy to adjust part of the codebase and understanding it)
- Testing (making sure the current version works and future version won't break it)
- Requirements (describing the current version and the planned changes)
- ...
If a project was just a clone, I'd sure people would just buy the existing version and be done with it. And sometimes they do, then a unique requirement comes and the whole process comes back into play.
If your job can be hallowed out into >90% entering prompts into AI text editors, you won't have to worry about continuing to be paid to do it every day for very long.
> Well, are they useful? ... Yeah, of course LLMs are useful, but we need to remain somewhat grounded in reality. How useful are LLMs?
They are useful enough that they can passably replace (much more expensive) humans in a lot of noncritical jobs, thus being a tangible tool for securing enterprise bottom lines.
From what I've seen in my own job and observing what my wife does (she's been working with the things on very LLM-centric processes and products in a variety of roles for about three years) not a lot of people are able to use them to even get a small productivity boost. Anyone less than very-capable trying to use them just makes a ton more work for someone more expensive than they are.
They're still useful, but they're not going to make cheap employees wildly more productive, and outside maybe a rare, perfect niche, they're not going to increase expensive employees' productivity so much that you can lay off a bunch of the cheap ones. Like, they're not even close to that, and haven't really been getting much closer despite improvements.
>they can dump out a boilerplate react frontend to a CRUD API
This is so clearly biased that it boarders on parody. You can only get out what you put in.
The real use case of current LLMs is that any project that would previously require collaboration can now be down solo with a much faster turnover. Of course in 20 years when compute finally catches up they will just be super intelligent AGI
I have Cursor running on my machine right now. I am even paying for it. This is in part because no matter what happens, people keep professing, basically every single time a new model is released, that it has finally happened: programmers are finally obsolete.
Despite the ridiculous hype, though, I have found that these things have crossed into usefulness. I imagine for people with less experience, these tools are a godsend, enabling them to do things they definitely couldn't do on their own before. Cool.
Beyond that? I definitely struggle to find things I can do with these tools that I couldn't do better without. The main advantage so far is that these tools can do these things very fast and relatively cheaply. Personally, I would love to have a tool that I can describe what I want in detailed but plain English and have it be done. It would probably ruin my career, but it would be amazing for building software. It'd be like having an army of developers on your desktop computer.
But, alas, a lot of the cool shit I'd love to do with LLMs doesn't seem to pan out. They're really good at TypeScript and web stuff, but their proficiency definitely tapers off as you veer out. It seems to work best when you can find tasks that basically amount to translation, like converting between programming languages in a fuzzy way (e.g. trying to translate idioms). What's troubling me the most is that they can generate shitloads of code but basically can't really debug the code they write beyond the most entry-level problem-solving. Reverse engineering also seems like an amazing use case, but the implementations I've seen so far definitely are not scratching the itch.
> Of course in 20 years when compute finally catches up they will just be super intelligent AGI
I am betting against this. Not the "20 years" part, it could be months for all we know; but the "compute finally catches up" part. Our brains don't burn kilowatts of power to do what they do, yet given basically unbounded time and compute, current AI architectures are simply unable to do things that humans can, and there aren't many benchmarks that are demonstrating how absolutely cataclysmically wide the gap is.
I'm certain there's nothing magical about the meat brain, as much as that is existentially challenging. I'm not sure that this follows through to the idea that you could replicate it on a cluster of graphics cards, but I'm also not personally betting against that idea, either. On the other hand, getting the absurd results we have gotten out of AI models today didn't involve modest increases. It involved explosive investment in every dimension. You can only explode those dimensions out so far before you start to run up against the limitations of... well, physics.
Maybe understanding what LLMs are fundamentally doing to replicate what looks to us like intelligence will help us understand the true nature of the brain or of human intelligence, hell if I know, but what I feel most strongly about is this: I do not believe LLMs are replicating some portion of human intelligence. They are very obviously neither a subset or superset or particularly close to either. They are some weird entity that overlaps in other ways we don't fully comprehend yet.
I see a difference between seeing them as valuable in their current state vs being "bullish about LLMs" in the stock market sense.
The big problem with being bullish in the stock market sense is that OpenAI isn't selling the LLMs that currently exist to their investors, they're selling AGI. Their pitch to investors is more or less this:
> If we accomplish our goal we (and you) will have infinite money. So the expected value of any investment in our technology is infinite dollars. No, you don't need to ask what the odds are of us accomplishing our goal, because any percent times infinity is infinity.
Since OpenAI and all the founders riding on their coat tails are selling AGI, you see a natural backlash against LLMs that points out that they are not AGI and show no signs of asymptotically approaching AGI—they're asymptotically approaching something that will be amazing and transformative in ways that are not immediately clear, but what is clear to those who are watching closely is that they're not approaching Altman's promises.
The AI bubble will burst, and it's going to be painful. I agree with the author that that is inevitable, and it's shocking how few people see it. But also, we're getting a lot of cool tech out of it and plenty of it is being released into the open and heavily commoditized, so that's great!
I think that people who don't believe LLMs to be AGI are not very good at Venn diagrams. Because they certainly are artificial, general, and intelligent according to any dictionary.
Good grief. You are deeply confused and/or deeply literal. That's not the accepted definition of AGI in any sense. One does not evaluate each word has an isolated component for testing the truth of a statement in an open compound word. Does your "living room" have organs?
It is that or you can't recognize a tongue-in-cheek comment on goalpost shifting. Wiki page you linked has the original definition of the term from 1997, dig it up. Better yet, look at the history of that page in Wayback machine and see with your own eyes how ChatGPT release changed it.
For reference, 1997 original: By advanced artificial general intelligence, I mean AI systems that rival or surpass the human brain in complexity and speed, that can acquire, manipulate and reason with general knowledge, and that are usable in essentially any phase of industrial or military operations where a human intelligence would otherwise be needed.
2014 wiki requirements: reason, use strategy, solve puzzles, and make judgments under uncertainty;
represent knowledge, including commonsense knowledge; plan; learn; communicate in natural language;
and integrate all these skills towards common goals.
No, it's really not. Joining words into a compound word enables the new compound to take on new meaning and evolve on its own, and if it becomes widely used as a compound it always does so. The term you're looking for if you care to google it is an "open compound noun".
A dog in the sun may be hot, but that doesn't make it a hot dog.
You can use a towel to dry your hair, but that doesn't make the towel a hair dryer.
Putting coffee on a dining room table doesn't turn it into a coffee table.
Spreading Elmer's glue on your teeth doesn't make it tooth paste.
The White House is, in fact, a white house, but my neighbor's white house is not The White House.
I could go on, but I think the above is a sufficient selection to show that language does not, in fact, work that way. You can't decompose a compound noun into its component morphemes and expect to be able to derive the compound's meaning from them.
You wrote so much while failing to read so little:
> in most cases
What do you think will happen if we will start comparing the lengths of the list ["hot dog", ...] and the list ["blue bird", "aeroplane", "sunny March day", ...]?
No, I read that, and it's wrong. Can you point me to a single compound noun that works that way?
A bluebird is a specific species. A blue parrot is not a bluebird.
An aeroplane is a vehicle that flies through the air at high speeds, but if you broke it down into morphemes and tried to reason it out that way you could easily argue that a two-dimensional flat surface that extends infinitely in all directions and intersects the air should count.
Sunny March day isn't a compound noun, it's a noun phrase.
Can you point me to a single compound noun (that is, a two-or-more-part word that is widely used enough to earn a definition in a dictionary, like AGI) that can be subjected to the kind of breaking apart into morphemes that you're doing without yielding obviously nonsensical re-interpretations?
I feel like LLMs are the same as the leap from "world before web search" to "world after web search." Yeah, in google, you get crap links for sure, and you have to wade through salesy links and random blogs. But in the pre-web-search world, your options were generally "ask a friend who seems smart" or "go to the library for quite a while," AND BOTH OF THOSE OPTIONS HAD PLENTY OF ISSUES. I found a random part in an old arduino kit I bought years ago, and GPT-4o correctly identified it and explained exactly how to hook it up and code for it to me. That is frickin awesome, and it saves me a ton of time and leads me to reuse the part. I used DeepResearch to research car options that fit my exact needs, and it was 100% spot on - multiple people have suggested models that DeepReearch did not identify that would be a fit, but every time I dig in, I find that DeepResearch was right and the alternative actually had some dealbreaker I had specified. Etc., etc.
In the 90s, Robert Metcalfe infamously wrote "Almost all of the many predictions now being made about 1996 hinge on the Internet’s continuing exponential growth. But I predict the Internet, which only just recently got this section here in InfoWorld, will soon go spectacularly supernova and in 1996 catastrophically collapse." I feel like we are just hearing LLM versions of this quote over and over now, but they will prove to be equally accurate.
Generic. For the Internet, more complex questions would have been "What are the potential benefits, what the potential risks, what will grow faster" etc. The problem is not the growth but what that growth means. For LLMs, the big clear question is "will they stop just being LLMs, and when will they". Progress is seen, but we seek a revolution.
It would be fine if it were sold that way, but there is so much hype. We're told that it's going to replace all of us and put us all out our jobs. They set the expectations so high. Like remember OpenAI showing a video of it doing your taxes for you? Predictions that super-intelligent AI is going to radically transform society faster than we can keep up? I think that's where most of the backlash is coming from.
> We're told that it's going to replace all of us and put us all out our jobs.
I think this is the source of a lot of the hype. There are people salivating at the thought of no longer needing to employ the peasant class. They want it so badly that they'll say anything to get more investment in LLMs even if it might only ever allow them to fire a fraction of their workers, and even if their products and services suffer because the output they get with "AI" is worse than what the humans they throw away were providing.
They know they're overselling it, but they're also still on their knees praying that by some miracle their LLMs trained on the collective wisdom of facebook and youtube comments will one day gain actual intelligence and they can stop paying human workers.
In the meantime, they'll shove "AI" into everything they can think of for testing and refinement. They'll make us beta test it for them. They don't really care if their AI makes your customer service experience go to shit. They don't care if their AI screws up your bill. They don't care if their AI rejects your claims or you get denied services you've been paying for and are entitled to. They don't care if their AI unfairly denies you parole or mistakenly makes you the suspect of a crime. They don't care if Dr. Sbaitso 2.0 misdiagnoses you. Your suffering is worth it to them as long as they can cut their headcount by any amount and can keep feeding the AI more and more information because just maybe with enough data one day their greatest dream will become reality, and even if that never happens a lot of people are currently making massive amounts of money selling that lie.
The problem is that the bubble will burst eventually. The more time goes by and AI doesn't live up to the hype the harder that hype becomes to sell. Especially when by shoving AI into everything they're exposing a lot of hugely embarrassing shortcomings. Repeating "AI will happen in just 10 more years" gives people a lot of time to make money and cash out though.
On the plus side, we do get some cool toys to play with and the dream of replacing humans has sparked more interest in robotics so it's not all bad.
Yeah, it won't do your taxes for you, but it can sure help you do them yourself. Probably won't put you out of your job either, but it might help you accomplish more. Of course, one result of people accomplishing more in less time is that you need fewer people to do the same amount of work - so some jobs could be lost. But it's also possible that for the most part instead, more will be accomplished overall.
People frame that like it's something we gain, efficiency, as if before we were wasting time by thinking for ourselves. I get that they can do certain things better, I'm not sure that delegating to them is free of charge. We're paying something, losing something. Probably learning and fulfillment. We become increasingly dependent on machines to do anything.
Something important happened when we turned the tables around, I don't feel it gets the credit it should. It used to be humans telling machines what to do. Now we're doing the opposite.
And it might even be right and not get you in legal trouble! Not that you'd know (until audit day) unless you went back and did them as a verification though.
Except now, you can hire a competent professional accountant and discover on audit day that they got taken over by private equity, replaced 90% of the professionals doing work with AI and made a lot of money before the consequences become apparent.
Yes, but you're going to pay through the nose for the "wouldn't have to worry about legal trouble at all" (part of what you're paying for with professional services is a degree of protection from their fuckups).
So going back to apples-and-apples comparison, i.e. assuming that "spend a lot of money to get it done for you" is not on the table, I'd trust current SOTA LLM to do a typical person's taxes better than they themselves would.
I pay my accountant 500 USD to file my taxes. I don't consider that "through the nose" relative to my my inflated tech salary.
If a person is making a smaller income their tax situation is probably very simple, and can be handled by automated tools like TurboTax (as the sibling comment suggests).
I don't see a lot of value add from LLMs in this particular context. It's a situation where small mistakes can result in legal trouble or thousands of dollars of losses.
I'm on a financial forum where people often ask tax questions, generally _fairly_ simple questions. An obnoxious recent trend on many forums, including this one, is idiots feeding questions into a magic robot and posting what it says as a response. Now, ChatGPT may be very good at, er, something, I dunno, I am assured that it has _some_ use by the evangelists, but it is not good at tax, and if people follow many of the answers it gives then they are likely to get in trouble.
If a trillion-parameter model can't handle your taxes, that to me says more about the tax code than the AI code.
People who paste undisclosed AI slop in forums deserve their own place in hell, no argument there. But what are some good examples of simple tax questions where current models are dangerously wrong? If it's not a private forum, can you post any links to those questions?
So, a super-basic one I saw recently, in relation to Irish tax. In Ireland, ETFs are taxed differently to normal stocks (most ETFs available here are accumulating, they internally re-invest dividends; this is uncommon for US ETFs for tax reasons). Normal stocks have gains taxed under the capital gains regime (33% on gains when you sell). ETFs are different; they're taxed 40% on gains when you sell, and they are subject to 'deemed disposal'; every 8 years, you are taxed _as if you had sold and re-bought_. The ostensible reason for this is to offset the benefit from untaxed compounding of dividends.
Anyway, the magic robot 'knew' all that. Where it slipped up was in actually _working_ with it. Someone asked for a comparison of taxation on a 20 year investment in individual stocks vs ETFs, assuming re-investment of dividends and the same overall growth rate. The machine happily generated a comparison showing individual stocks doing massively better... On closer inspection, it was comparing growth for 20 years for the individual stocks to growth of 8 years for the ETFs. (It also got the marginal income tax rate wrong.)
But the nonsense it spat out _looked_ authoritative on first glance, and it was a couple of replies before it was pointed out that it was completely wrong. The problem isn't that the machine doesn't know the rules; insofar as it 'knows' anything, it knows the rules. But it certainly can't reliably apply them.
(I'd post a link, but they deleted it after it was pointed out that it was nonsense.)
Interesting, thanks. That doesn't seem like an entirely simple question, but it does demonstrate that the model is still not great at recognizing when it is out of its league and should either hedge its answer, refuse altogether, or delegate to an appropriate external tool.
This failure seems similar to a case that someone brought up earlier ( https://news.ycombinator.com/item?id=43466531 ). While better than expected at computation, the transformer model ultimately overestimates its own ability, running afoul of Dunning-Kruger much like humans tend to.
Replying here due to rate-limiting:
One interesting thing is that when one model fails spectacularly like that, its competitors often do not. If you were to cut/paste the same prompt and feed it to o1-pro, Claude 3.7, and Gemini 2.5, it's possible that they would all get it wrong (after all, I doubt they saw a lot of Irish tax law during training.) But if they do, they will very likely make different errors.
Unfortunately it doesn't sound like that experiment can be run now, but I've run similar tests often enough to tell me that wrong answers or faulty reasoning are more likely model-specific shortcomings rather than technology-specific shortcomings.
That's why I get triggered when people speak authoritatively on here about what AI models "can't do" or "will never be able to do." These people have almost always, almost without exception, been proven dead wrong in the past, but that never seems to bother them.
It's the sort of mistake that it's hard to imagine a human making, is the thing. Many humans might have trouble compounding at all, but the 20 year/8 year confusion just wouldn't happen. And I think it is on the simple side of tax questions (in particular all the _rules_ involved are simple, well-defined, and involve no ambiguity or opinion; you certainly can't say that of all tax rules). Tax gets _complicated_.
This reminds me of the early days of Google, when people who knew how to phrase a query got dramatically better results than those who basically just entered what they were looking for as if asking a human.
And indeed, phrasing your prompts is important here too, but I mean more that by having a bit of an understanding of how it works and how it differs from a human, you can avoid getting sucked in by most of these gaps in its abilities, while benefitting from what it's good at. I would ask it the question about the capital gains rules (and would verify the response probably with a link I'd ask it to provide), but I definitely wouldn't expect it to correctly provide a comparison like that. (I might still ask, but would expect to have to check its work.)
Forget OpenAI ChatGPT doing your taxes for you. Now Gemini will write up your sales slides about Gouda cheese, stating wrongly in the process that gouda makes up about 50% of all cheese consumption worldwide :) These use-cases are getting more useful by the day ;)
I mean, it's been like 3 years. 3 years after the web came out was barely anything. 3 years after the first GPU was cool, but not that cool. The past three years in LLMs? Insane.
Things could stall out and we'll have bumps and delays ... I hope. If this thing progresses at the same pace, or speeds up, well ... reality will change.
Or not. Even as they are, we can build some cool stuff with them.
> And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly?
The trouble is that, while incredibly amazing, mind blowing technology, it falls down flat often enough that it is a big gamble to use. It is never clear, at least to me, what it is good at and what it isn't good at. Many things I assume it will struggle with, it jumps in with ease, and vice versa.
As the failures mount, I admittedly do find it becoming harder and harder to compel myself to see if it will work for my next task. It very well might succeed, but by the time I go to all the trouble to find out it often feels that I may as well just do it the old fashioned way.
If I'm not alone, that could be a big challenge in seeing long-term commercial success. Especially given that commercial success for LLMs is currently defined as 'take over the world' and not 'sustain mom and pop'.
> the speed at which it is progressing is insane.
But same goes for the users! As a result the failure rate appears to be closer to a constant. Until we reach the end of human achievement, where the humans can no longer think of new ways to use LLMs, that is unlikely to change.
It's becoming clear to me that some people just have vastly different uses and use cases than I do. Summarizing a deep, cutting edge physics paper is I'm sure vasty different than summarizing a web page while I'm browsing HN, or writing a Python plugin for Icinga to monitor a web endpoint that spits out JSON.
The author says they use several LLMs every day and they always produce incorrect results. That "feels" weird, because it seems like you'd develop an intuition fairly quickly for the kinds of questions you'd ask that LLMs can and can't answer. If I want something with links to back up what is being said, I know I should ask Perplexity or maybe just ask a long-form prompt-like question of Google or Kagi. If I want a Python or bash program I'm probably going to ask ChatGPT or Gemini. If I want to work on some code I want to be in Cursor and am probably using Claude. For general life questions, I've been asking Claude and ChatGPT.
Running into the same issue with LLMs over and over for years, with all due respect, seems like the "doing the same thing and expecting different results" situation.
This is so true. I really hope she joins this conversation so we can have a productive discussion and understand what she's actually hoping to achieve.
The two sides are never going to understand each other because I suspect we work on entirely different things and have radically different workflows. I suspect that hackernews gets more use out of LLMs in general than the average programmer because they are far more likely to be at a web startup and more likely to actually be bottlenecked on how fast you can physically put more code in the file and ship sooner.
If you work on stuff that is at all niche (as in, stack overflow was probably not going to have the answer you needed even before LLMs became popular), then it's not surprising when LLMs can't help because they've not been trained.
For people that were already going fast and needed or wanted to put out more code more quickly, I'm sure LLMs will speed them up even more.
For those of us working on niche stuff, we weren't going fast in the first place or being judged on how quickly we ship in all likelihood. So LLMs (even if they were trained on our stuff) aren't going to be able to speed us up because the bottleneck has never been about not being able to write enough code fast enough. There are architectural and environmental and testing related bottlenecks that LLMs don't get rid of.
That's a good point, I've personally not got much use out of LLMs (I use them to generate fantasy names for my D&D campaign, but find they fall down for anything complex) - but I've also never got much use out of StackOverflow either.
I don't think I'm working on anything particularly niche, but nor is it cookie-cutter generic either, and that could be enough to drastically reduce their utility.
Two things can be true: e.g., that LLMs are incredible tech we only dreamed of having, and that they’re so flawed that they’re hard to put to productive use.
I just tried to use the latest Gemini release to help me figure out how to do some very basic Google Cloud setup. I thought my own ignorance in this area was to blame for the 30 minutes I spent trying to follow its instructions - only to discover that Gemini had wildly hallucinated key parts of the plan. And that’s Google’s own flagship model!
I think it’s pretty telling that companies are still struggling to find product-market fit in most fields outside of code completion.
It really depends on the task. Like Sabine, I’m operating on the very frontier of a scientific domain that is extremely niche. Every single LLM out there is worse than useless in this domain. It spits out incomprehensible garbage.
But ask it to solve some leet code and it’s brilliant.
The question I ask afterwords then is: is solving some leet code brilliant? Is designing a simple inventory system brilliant if they've all been accomplished already? My answer tends towards no, since they still make mistakes in the process, and it harms newer developers from learning.
It is a matter of speech. That said I have seen LLMs do brilliant work. There are just some things like the hard sciences where its understanding is only surface deep.
I should start collecting examples, if only for threads like this. Recently I tried to llm a tsserver plugin that treats lines ending with "//del" as empty. You can only imagine all the sneaky failures in the chat and the total uselessness of these results.
Anything that is not literally millions (billions?) of times in the training set is doomed to be fantasized about by an LLM. In various ways, tones, etc. After many such threads I came to conclusion that people who find it mostly useful are simply treading water as they probably have done most of their career. Their average product is a react form with a crud endpoint and excitement about it. I can't explain their success reports otherwise, cause it rarely works on anything beyond that.
Welcome to the new digital divide people, and the start of a new level of "inequality" in this world. This thread is proof that we've diverged and there is a huge subset of people that will not have their minds changed easily.
Hallucinating incorrect information is worse than useless. It is actively harmful.
I wonder how much this affects our fundraising, for example. No VC understands the science here, so they turn to advisors (which is great!) or to LLMs… which has us starting off on the wrong foot.
I work in a field that is not even close to a scientific nishe - software reverse engineering - and LLM will happily lie to me all the time, for every question I have. I find out useful to generate some initial boilerplate but... that's it. AI autocompletion saved me an order of magnitude more time, and nobody is hyped about it.
Sabine is lex Friedman for women. Stay in your lane about quantum physics and stop trying to opine on LLMs. I’m tired of seeing the huge amount of FUD from her.
Because it has a sample size of our collective human knowledge and language big enough to trick our brains into believing that.
As a parallel thought, it reminds of a trick derren brown did. He picked every horse correctly across 6 races. The person who he was picking for was obviously stunned, as were the audience watching it.
The reality of course is just that people couldn't comprehend that he just had to go to extreme and tedious lengths to make this happen. They started with 7000 people and filmed every one like it was going to be the "one" and then the probability pyramid just dropped people out. It was such a vast undertaking of time and effort that we're biased towards believing there must be something really happening here.
LLMs currently are a natural language interface to a Microsoft Encarta like system that is so unbelievably detailed and all encompassing that we risk accepting that there's something more going on there. There isn't.
Again, it's not intelligence. It's a mirror that condenses our own intelligence and reflects back to us using probabilities at a scale that tricks us into the notion there is something more than just a big index and clever search interface.
There is no meaningful interpretation of the word intelligence that applies, psychologically or philosophically, to what is going on. Machine Learning is far more apt and far less misleading.
I saw the transition from ML to AI happen in academic papers and then pitch decks in real time. It was to refill the well when investors were losing faith that ML could deliver on the promises. It was not progress driven.
this doesn't make any more sense than calling LLMs "intelligence". There is no "our intelligence" beyond a concept or an idea that you or someone else may have about the collective, which is an abstraction.
What we do each have our own intelligence, and that intelligence is and likely always be, no matter how science progresses, ineffable. So my point is you can't say your made up/ill defined concept is any realer than any other made up/ill defined concept.
> Wah, it can't write code like a Senior engineer with 20 years of experience!
No, that's not my problem with it. My problem with it is that inbuilt into the models of all LLMs is that they'll fabricate a lot. What's worse, people are treating them as authoritative.
Sure, sometimes it produces useful code. And often, it'll simply call the "doTheHardPart()" method. I've even caught it literally writing the wrong algorithm when asked to implement a specific and well known algorithm. For example, asking it "write a selection sort" and watching it write a bubble sort instead. No amount of re-prompts pushes it to the right algorithm in those cases either, instead it'll regenerate the same wrong algorithm over and over.
Outside of programming, this is much worse. I've both seen online and heard people quote LLM output as if it were authoritative. That to me is the bigger danger of LLMs to society. People just don't understand that LLMs aren't high powered attorneys, or world renown doctors. And, unfortunately, the incorrect perception of LLMs is being hyped both by LLM companies and by "journalists" who are all to ready to simply run with and discuss the press releases from said LLM companies.
Unfortunately they are trained first and foremost as plausibility engines. The central dogma is that plausibility will (with continuing progress & scale) converge towards correctness, or "faithfulness" as it's sometimes called in the literature.
This remains very far from proven.
The null hypothesis that would be necessary to reject, therefore, is a most unfortunate one, viz. that by training for plausibility we are creating the world's most convincing bullshit machines.
> plausibility [would] converge towards correctness
That is the most horribly dangerous idea, as we demand that the agent guesses not, even - and especially - when the agent is a champion at guessing - we demand that the agent checks.
If G guesses from the multiplication table with remarkable success, we more strongly demand that G computes its output accurately instead.
Oracles that, out of extraordinary average accuracy, people may forget are not computers, are dangerous.
One man's "plausibility" is another person's "barely reasoned bullshit". I think you're being generous, because LLMs explicitly don't deal in facts, they deal in making stuff up that is vaguely reminiscent of fact. Only a few companies are even trying to make reasoning (as in axioms-cum-deductions, i.e., logic per se) a core part of the models, and they're really struggling to hand-engineer the topology and methodology necessary for that to work roughly as facsimile of technical reasoning.
I’m not really being generous. I merely think if I’m gonna condemn something as high-profile snake oil for the tragically gullible, it’s helpful to have a solid basis for doing so. And it’s also important to allow oneself to be wrong about something, however remote the possibility may currently seem, and preferably without having to revise one’s principles to recognise it.
As a sort of related anecdote... if you remember the days before google, people sitting around your dinner table arguing about stuff used to spew all sorts of bullshit then drop that they have a degree from XYZ university and they won the argument... when google/wikipedia came around turned out that those people were in fact just spewing bullshit. I'm sure there was some damage but it feels like a similar thing. Our "bullshit-radar" seems to be able to adapt to these sorts of things.
Well, with every conspiracy theories thriving on this day and age with access to technology and information at one fingertips. If you add that now the US administration effectively spewing bullshit every few minutes.
The best example of this was an arguement I had a little while ago where I was talking about self driving and I was mentioning that I have a hard time trusting any system relying only on cameras, to which I was being told that I didn't understand how machine learning works and obviously they were correct and I was wrong and every car would be self driving within 5 years. All of these things could easily be verified independently.
Suffice to say that I am not sure that the "bullshit-radar" is that adaptive...
Mind you, this is not limited to the particular issue at hand but I think those situations needs to be highlighted, because we get fooled easily by authoritative delivery...
Language models are closing the gaps that still remain at an amazing rate. There are still a few gaps, but if we consider what has happened just in the last year, and extrapolated 2-3 years out....
I think you are discounting the fact that you can weed out people who make a habit of that, but you can't do that with LLMs if they are all doing that.
Some people trust Alex Jones, while the vast majority realize that he just fabricates untruths constantly. Far fewer people realize that LLMs do the same.
People know that computers are deterministic, but most don't realize that determinism and accuracy are orthogonal. Most non-IT people give computers authoritative deference they do not deserve. This has been a huge issue with things like Shot Spotter, facial recognition, etc.
One thing I see a lot on X is people asking Grok what movie or show a scene is from.
LLMs must be really, really bad at this because not only is it never right, it actually just makes something up that doesn't exist. Every, single, time.
I really wish it would just say "I'm not good at this, so I do not know."
When your model of the world is build on the relative probabilities of the next opaque apparently-arbitrary number in context of prior opaque apparently-arbitrary numbers, it must be nearly impossible to tell the difference between “there are several plausible ways to proceed, many of which the user will find useful or informative, and I should pick one” and “I don’t know”. Attempting to adjust to allow for the latter probably tends to make the things output “I don’t know” all the time, even when the output they’d have otherwise produced would have been good.
I thought about this of course, and I think a reasonable 'hack' for now is to more or less hardcode things that your LLM sucks at, and override it to say it doesn't know. Because continually failing at basic tasks is bad for confidence in said product.
I mean, it basically does the same thing if you ask it to do anything racist or offensive, so that override ability is obviously there.
So if it identifies the request as identifying a movie scene, just say 'I don't know', for example.
Hardcode by whom? Who do we trust with this task to do it correctly? Another LLM that suffers from the same fundamental flaw or by a low paid digital worker in a developing country? Because that's the current solution. And who's gonna pay for all that once the dumb investment money runs out, who's gonna stick around after the hype?
By the LLM team (Grok team, in this case). I don't mean for the LLM to be sentient enough to know it doesn't know the answer, I mean for the LLM to identify what is being asked of it, and checking to see if that's something on the 'blacklist of actions I cannot do yet', said list maintained by humans, before replying.
No different than when asking ChatGPT to generate images or videos or whatever before it could, it would just tell you it was unable to.
> It's impossible to predict with certainty who will be the U.S. President in 2046. The political landscape can change significantly over time, and many factors, including elections, candidates, and events, will influence the outcome. The next U.S. presidential election will take place in 2028, so it would be difficult to know for sure who will hold office nearly two decades from now.
I can do this because it is in fact the most likely thing to continue with, word by word.
But the most likely thing to continue a paper with is not to say at the end „I don‘t know“. It is actually providing sources which it proceeds to do wrongly.
>> We need an AI technology that can output "don't know" when appropriate. How's that coming along?
Heh. Easiest answer in the world. To be able to say "don't know", one has first to be able to "know". And we ain't there yet, by large. Not even flying by a million miles of it.
Needs meta annotation of certainty on all nodes and tokkens that accumulates while reasoning . Also gives the ability to train in believes, as in overriding any uncertainty. Right now we are in the pure believes phase.AI is its own god right now, pure blissful believe without the sin of doubt.
Sure we have. We don't have a perfect solution but it's miles better than what we have for LLMs.
If a lawyer consistently makes stuff up on legal filings, in the worst cases they can lose their license (though they'll most likely end up getting fines).
If a doctor really sucks, they become uninsurable and ultimately could lose their medical license.
Devs that don't double check their work will cause havoc with the product and, not only will they earn low opinions from their colleges, they could face termination.
How many companies train on data that contains 'i don't know' responses. Have you ever talked with a toddler / young child? You need to explicitly teach children to not bull shit. At least I needed to teach mine.
Never mind toddlers, have you ever hired people? A far smaller proportion of professional adults will say “I don’t know” than a lot of people here seem to believe.
No I call judgement a logical process of assessment.
You have an amount of material that speaks of the endeavours in some sport of some "Michael Jordan", the logic in the system decides that if a "Michael Jordan" in context can be construed to be "that" "Michael Jordan" then there will be sound probabilities he is a sportsman; you have very little material about a "John R. Brickabracker", the logic in the system decides that the material is insufficient to take a good guess.
Then I expect your personal fortunes are tied up in hyping the "generative AI are just like people!" meme. Your comment is wholly detached from the reality of using LLMs. I do not expect we'll be able to meet eye-to-eye on the topic.
This exists, each next token has a probability assigned to it. High probability means "it knows", if there's two or more tokens of similar probability, or the prob of the first token is low in general, then you are less confident about that datum.
Of course there's areas where there's more than one possible answer, but both possibilities are very consistent. I feel LLMs (chatgpt) do this fine.
Also can we stop pretending with the generic name for ChatGPT? It's like calling Viagra sildenafil instead of viagra, cut it out, there's the real deal and there's imitations.
> low in general, then you are less confident about that datum
It’s very rarely clear or explicit enough when that’s the case. Which makes sense considering that the LLMs themselves do not know the actual probabilities
Maybe this wasn't clear, but the Probabilities are a low level variable that may not be exposed in the UI, it IS exposed through API as logprobs in the ChatGPT api. And of course if you have binary access like with a LLama LLM you may have even deeper access to this p variable
> it IS exposed through API as logprobs in the ChatGPT api
Sure but they often are not necessarily easily interpretable or reliable.
You can use it to compare a model’s confidence of several different answers to the same question but anything else gets complicated and not necessarily that useful.
This is very subjective, but I feel they are all imitators of ChatGPT. I also contend that the ChatGPT API (and UI) will or has become a de facto standard in the same manner that intel's 80886 Instruction set evolved into x86
would you rather the LLM make up something that sounds right when it doesn't know, or would you like it to claim "i don't know" for tasks it actually can figure out? because presumably both happen at some rate, and if it hallucinates an answer i can at least check what that answer is or accept it with a grain of salt.
nobody freaks out when humans make mistakes, but we assume our nascent AIs, being machines, should always function correctly all the time
> would you rather the LLM make up something that sounds right when it doesn't know, or would you like it to claim "i don't know" for tasks it actually can figure out?
And that's part of the problem - you're thinking of it like a hammer when it's not a hammer. It's asking someone at a bar a question. You'll often get an answer - but even if they respond confidently that doesn't make it correct. The problem is people assuming things are fact because "someone at a bar told them." That's not much better than, "it must be true I saw it on TV".
It's a different type of tool - a person has to treat it that way.
Asking a question is very contextual. I don't ask a lawyer house engineering problems, nor my doctor how to bake cake. That means If I'm asking someone at a bar, I'm already prepare to deal with the fact that the person is maybe drunk, probably won't know,... And more often than not, I won't even ask the question unless dire needs. Because it's the most inefficient way to get an informed answer.
I wouldn't bat an eye if people were taking code suggestions, then review it and edit it to make it correct. But from what I see, it's pretty a direct push to production if they got it to compile, which is different from correct.
> What's worse, people are treating them as authoritative. … I've both seen online and heard people quote LLM output as if it were authoritative.
Thats not an LLM problem. But indeed quite bothersome. Dont tell me what Chatgpt told you. Tell me what you know. Maybe you got it from ChatGPT and verified it. Great. But my jaw kind of drops when people cite an LLM and just assume it’s correct.
It might not be an LLM problem, but it’s an AI-as-product problem. I feel like every major player’s gamble is that they can cement distinct branding and model capabilities (as perceived by the public) faster than the gradual calcification of public AI perception catches up with model improvements - every time a consumer gets burned by AI output in even small ways, the “AI version of Siri/Alexa only being used for music and timers” problem looms a tiny, tiny bit larger.
These are not similar. Wikipedia says the same thing to everybody, and when what it says is wrong, anybody can correct it, and they do. Consequently it's always been fairly reliable.
Lies and mistakes persist on Wikipedia for many years. They just need to sound truthy so they don't jump out to Wikipedia power users who aren't familiar with the subject. I've been keeping tabs on one for about five years, and its several years older than that, which I won't correct because I am IP range banned and I don't feel like making an account and dealing with any basement dwelling power editor NEETs who read Wikipedia rules and processes for fun. I know I'm not the only one to, because this glaring error isn't in a particularly obscure niche, its in the article for a certain notorious defense initiative which has been in the news lately, so this error has plenty of eyes on it.
In fact, the error might even be a good thing; it reminds attentive readers that Wikipedia is an unreliable source and you always have to check if citations actually say the thing which is being said in the sentence they're attached to.
That's true too, but the bigger difference from my point of view is that factual errors in Wikipedia are relatively uncommon, while, in the LLM output I've been able to generate, factual errors vastly outnumber correct facts. LLMs are fantastic at creativity and language translation but terrible at saying true things instead of false things.
Comments like these honestly make me much more concerned than LLM hallucinations. There have been numerous times when I've tracked down the source for a claim, only to find that the source was saying something different, or that the source was completely unreliable (sometimes on the crackpot level).
Currently, there's a much greater understanding that LLM's are unreliable. Whereas I often see people treat Wikipedia, posts on AskHistorians, YouTube videos, studies from advocacy groups, and other questionable sources as if they can be relied on.
The big problem is that people in general are terrible at exercising critical thinking when they're presented with information. It's probably less of an issue with LLMs at the moment, since they're new technology and a certain amount of skepticism gets applied to their output. But the issue is that once people have gotten more used to them, they'll turn off they're critical thinking in the same manner that they turn it off when absorbing information from other sources that they're used to.
Wikipedia is fairly reliable if our standard isn't a platonic ideal of truth but real-world comparators. Reminds me of Kant's famous line. "From the crooked timber of humankind, nothing entirely straight can be made".
The sell of Wikipedia was never "we'll think so you don't have to", it was never going to disarm you of your skepticism and critical thought, and you can actually check the sources. LLMs are sold as "replace knowledge work(ers)", you cannot check their sources, and the only way you can check their work is by going to something like Wikipedia. They're just fundamentally different things.
> The sell of Wikipedia was never "we'll think so you don't have to", it was never going to disarm you of your skepticism and critical thought, and you can actually check the sources.
You can check them, but Wikipedia doesn't care what they say. When I checked a citation on the French Toast page, and noted that the source said the opposite of what Wikipedia did by annotating that citation with [failed verification], an editor showed up to remove that annotation and scold me that the only thing that mattered was whether the source existed, not what it might or might not say.
I feel like I hear a lot of criticism about Wikipedia editors, but isn't Wikipedia overall pretty good? I'm not gonna defend every editor action or whatever, but I think the product stands for itself.
Wikipedia is overall pretty good, but it sometimes contains erroneous information. LLMs are overall pretty good, but they sometimes contain erroneous information.
The weird part is when people get really concerned that someone might treat the former as a reliable source, but then turn around and argue that people should treat the latter as a reliable source.
I had a moment of pique where I was just gonna copy paste my reply to this rehash of your original point that is non-responsive to what I wrote, but I've found myself. Instead, I will link to the Wikipedia article for Equivocation [0] and ChatGPT's answer to "are wikipedia and LLMs alike?"
Wikipedia occasionally has errors, which are usually minor. The LLMs I've tried occasionally get things right, but mostly emit limitless streams of plausible-sounding lies. Your comment paints them as much more similar than they are.
In my experience, it's really common for wikipedia to have errors, but it's true that they tend to be minor. And yes, LLMs mostly just produce crazy gibberish. They're clearly worse than wikipedia. But I don't think wikipedia is meeting a standard it should be proud of.
> Whereas I often see people treat Wikipedia, posts on AskHistorians, YouTube videos, studies from advocacy groups, and other questionable sources as if they can be relied on.
One of these things is not like the others! Almost always, when I see somebody claiming Wikipedia is wrong about something, it's because they're some kind of crackpot. I find errors in Wikipedia several times a year; probably the majority of my contribution history to Wikipedia https://en.wikipedia.org/wiki/Special:Contributions/Kragen consists of me correcting errors in it. Occasionally my correction is incorrect, so someone corrects my correction. This happens several times a decade.
By contrast, I find many YouTube videos and studies from advocacy groups to be full of errors, and there is no mechanism for even the authors themselves to correct them, much less for someone else to do so. (I don't know enough about posts on AskHistorians to comment intelligently, but I assume that if there's a major factual error, the top-voted comments will tell you so—unlike YouTube or advocacy-group studies—but minor errors will generally remain uncorrected; and that generally only a single person's expertise is applied to getting the post right.)
But none of these are in the same league as LLM output, which in my experience usually contains more falsehoods than facts.
> Currently, there's a much greater understanding that LLM's are unreliable.
Wikipedia being world-editable and thus unreliable has been beaten into everyone's minds for decades.
LLMs just popped into existence a few years ago, backed by much hype and marketing about "intelligence". No, normal people you find on the street do not in fact understand that they are unreliable. Watch some less computer literate people interact with ChatGPT - it's terrifying. They trust every word!
If you read a non-fiction book on any topic, you can probably assume that half of the information in it is just extrapolated from the authors experience.
Even scientific articles are full of inaccurate statements, the only thing you can somewhat trust are the narrow questions answered by the data, which is usually a small effect that may or may not be reproducible...
No, different media are different—or, better said, different institutions are different, and different media can support different institutions.
Nonfiction books and scientific papers generally only have one person, or at best a dozen or so (with rare exceptions like CERN papers), giving attention to their correctness. Email messages and YouTube videos generally only have one. This limits the expertise that can be brought to bear on them. Books can be corrected in later printings, an advantage not enjoyed by the other three. Email messages and YouTube videos are usually displayed together with replies, but usually comments pointing out errors in YouTube videos get drowned in worthless me-too noise.
But popular Wikipedia articles are routinely corrected by hundreds or thousands of people, all of whom must come to a rough consensus on what is true before the paragraph stabilizes.
Consequently, although you can easily find errors in Wikipedia, they are much less common in these other media.
Yes, though by different degrees. I wouldn't take any claim I read on Wikipedia, got from an LLM, saw in a AskHistorians or Hacker News reply, etc., as fact, and I would never use any of those as a source to back up or prove something I was saying.
Newspaper articles? It really depends. I wouldn't take paraphrased quotes or "sources say" as fact.
But as you move to generally more reliable sources, you also have to be aware that they can mislead in different ways, such as constructing the information in a particular way to push a particular narrative, or leaving out inconvenient facts.
And that is still accurate today. Information always contains a bias from the narrators perspective. Having multiple sources allows one to triangulate the accuracy of information. Making people use one source of information would allow the business to control the entire narrative. Its just more of a business around people and sentiments than being bullish on science.
And they were right, right? They recognized it had structural faults that made it possible for bad data to sip in. The same is valid for LLMs: they have structural faults.
So what is your point? You seem to have placed assumptions there. And broad ones, so that differences between the two things, and complexities, the important details, do not appear.
It is, if the purpose of LLMs was to be AI. "Large language model" as a choir of pseudorandom millions converged into a voice - that was achieved, but it is by definition out of the professional realm. If it is to be taken as "artificial intelligence", then it has to have competitive intelligence.
> But my jaw kind of drops when people cite an LLM and just assume it’s correct.
Yes but they're literally told by allegedly authoritative sources that it's going to change everything and eliminate intellectual labor, so is it totally their fault?
They've heard about the uncountable sums of money spent on creating such software, why would they assume it was anything short of advertised?
> Yes but they're literally told by allegedly authoritative sources that it's going to change everything and eliminate intellectual labor
Why does this imply that they’re always correct? I’m always genuinely confused when people pretend like hallucinations are some secret that AI companies are hiding. Literally every chat interface says something like “LLMs are not always accurate”.
> Literally every chat interface says something like “LLMs are not always accurate”.
In small, de-emphasized text, relegated to the far corner of the screen. Yet, none of the TV advertisements I've seen have spent any significant fraction of the ad warning about these dangers. Every ad I've seen presents someone asking a question to the LLM, getting an answer and immediately trusting it.
So, yes, they all have some light-grey 12px disclaimer somewhere. Surprisingly, that disclaimer does not carry nearly the same weight as the rest of the industry's combined marketing efforts.
> In small, de-emphasized text, relegated to the far corner of the screen.
I just opened ChatGPT.com and typed in the question “When was Mr T born?”.
When I got the answer there were these things on screen:
- A menu trigger in the top-left.
- Log in / Sign up in the top right
- The discussion, in the centre.
- A T&Cs disclaimer at the bottom.
- An input box at the bottom.
- “ChatGPT can make mistakes. Check important info.” directly underneath the input box.
I dislike the fact that it’s low contrast, but it’s not in a far corner, it’s immediately below the primary input. There’s a grand total of six things on screen, two of which are tucked away in a corner.
This is a very minimal UI, and they put the warning message right where people interact with it. It’s not lost in a corner of a busy interface somewhere.
Maybe it's just down to different screen sizes, but when I open a new chat in chat GPT, the prompt is in the center of the screen, and the disclaimer is quite a distance away at the very bottom of the screen.
Though, my real point is we need to weigh that disclaimer, against the combined messaging and marketing efforts of the AI industry. No TV ad gives me that disclaimer.
Then we can look at people's behavior. Look at the (surprisingly numerous) cases of lawyers getting taken to the woodshed by a judge for submitting filings to a court with chat GPT introduced fake citations! Or, someone like Ana Navarro confidentially repeating an incorrect fact, and when people pushed back saying "take it up with chat GPT" (https://x.com/ananavarro/status/1864049783637217423).
I just don't think the average person who isn't following this closely understands the disclaimer. Hell, they probably don't even really read it, because most people skip over reading most de-emphasized text in most-UIs.
So, in my opinion, whether it's right next to the text-box or not, the disclaimer simply cannot carry the same amount of cultural impact as the "other side of the ledger" that are making wild, unfounded claims to the public.
> Yes but they're literally told by allegedly authoritative sources that it's going to change everything and eliminate intellectual labor, so is it totally their fault?
3rd Order Ignorance (3OI)—Lack of Process.
I have 3OI when I don't know a suitably efficient way to find out I don't know that I don't know something. This is lack of process, and it presents me with a major problem: If I have 3OI, I don't know of a way to find out there are things I don't know that I don't know.
—- not from an llm
My process: use llms and see what I can do with them while taking their Output with a grain of salt.
But the issue of the structural fault remains. To state the phenomenon (hallucination) is not "superficial", as the root does not add value in the context.
Symptom: "Response was, 'Use the `solvetheproblem` command'". // Cause: "It has no method to know that there is no `solvetheproblem` command". // Alarm: "It is suggested that it is trying to guess a plausible world through lacking wisdom and data". // Fault: "It should have a database of what seems to be states of facts, and it should have built the ability to predict the world more faithfully to facts".
My company just broadly adopted AI. It’s not a tech company and usually late to the game when it comes to tech adoption.
I’m counting down the days when some AI hallucination makes its way all the way to the C-suite. People will get way too comfortable with AI and don’t understand just how wrong it can be.
Some assumption will come from AI, no one will check it and it’ll become a basic business input. Then suddenly one day someone smart will say “thats not true” and someone will trace it back to AI. I know it.
I assume at that point in time there will be some general directive on using AI and not assuming it’s correct. And then AI will slowly go out of favor.
People fabricated a lot too. Yesterday I spent far less time fixing issues in the far more complex and larger changes Claude Code managed to churn out than what the junior developer I worked with needed. Sometimes it's the reverse. But with my time factored in, working with Claude Code is generally more productive for me than working with a junior. The only reason I still work with a junior dev is as an investment into teaching him.
Every developer I've ever worked with have gotten things wrong. Whether you call that hallucinating or not is irrelevant. What matters is the effort it takes to fix.
On the logically practical point I agree with you (what counts in the end in the specific process you mention is the gain vs loss game), but my point was that if your assistant is structurally delirious you will have to expect a big chunk of the "loss" as structural.
> It turns out that, in Claude, refusal to answer is the default behavior
I.e., boxes that incline to different approaches to heuristic will behave differently and offer different value (to be further assessed within a framework of complexity, e.g. "be creative but strict" etc.)
And my direct experience is that I often spend less time directing, reviewing and fixing code written by Claude Code at this point than I do for a junior irrespective of that loss. If anything, Claude Code "knows" my code bases better. The rest, then, to me at least is moot.
Claude is substantially cheaper for me, per reviewed, fixed change committed. More importantly to me, it demands less of my limited time per reviewed, fixed change committed.
Having a junior dev working with me at this point wouldn't be worth it to me if it wasn't for the training aspect: We still need pipelines of people who will learn to use the AI models, and who will learn to do the things it can't do well.
But my point was: it's good that Claude has become a rightful legend in the realm of coding, but before and regardless, a candidate that told you "that class will have a .SolveAnyProblem() method: I want to believe" presents an handicap. As you said no assistant revealed to be perfect, but assistants who attempt mixing coding sessions and creative fiction writing raise alarms.
But this was true before LLMs. People would and still do take any old thing from an internet search and treat it as true. There is a known, difficult-to-remedy failure to properly adjudicate information and source quality, and you can find it discussed in research prior to the computer age. It is a user problem more than a system problem. In my experience, with the way I interact with LLMs, they are more likely to give me useful output than not, and this is borne out by mainstream non-edge-case academic peer-reviewed work. Useful does not necessarily equal 100% correct, just as a Google search does not. I judge and vet all information, whether from an LLM, search, book, paper, or wherever We can build a straw person who "always" takes LLM output as true and uses it as-is but those are the same people who use most information tools poorly, be they internet search, dictionaries, or even looking in their own files for their own work or sent mail (I say this as an IT professional who has seen the worker types from before the pre-internet days through now). In any case, we use automobiles despite others misusing them. But only the foolish among us completely take our hands off the wheel for any supposed "self-driving" features. While we must prevent and decry the misuse by fools, we cannot let their ignorance hold us back. Let's let their ignorance help make tools, as they help identify more undesirable scenarios.
> My problem with it is that inbuilt into the models of all LLMs is that they'll fabricate a lot. What's worse, people are treating them as authoritative.
The same is true about the internet, and people even used to use these arguments to try to dissuade people from getting their information online (back when Wikipedia was considered a running joke, and journalists mocked blogs). But today it would be considered silly to dissuade someone from using the internet just because the information there is extremely unreliable.
Many programmers will say Stack Overflow is invaluable, but it's also unreliable. The answer is to use it as a tool and a jumping off point to help you solve your problem, not to assume that its authoritative.
The strange thing to me these days is the number of people who will talk about the problems with misinformation coming from LLMs, but then who seem to uncritically believe all sorts of other misinformation they encounter online, in the media, or through friends.
Yes, you need to verify the information you're getting, and this applies to far more than just LLMs.
Shades of grey fallacy. You have way more context clues about the information on the internet than you do with an LLM. In fact, with an LLM you have zero(-ish?).
I can peruse your previous posts to see how truthful you are, I can tell if your post has been down/upvoted, I can read responses to your post to see if you've been called out on anything, etc.
This applies tenfold in real life where over time you get to build comprehensive mental models of other people.
I have decided it must be attached to a sort of superiority complex. These types of people believe they are capable of deciphering fact from fiction but the general population isn’t so LLMs scare them because someone might hear something wrong and believe it. It almost seems delusional. You have to be incredibly self aggrandizing in your mind to think this way. If LLMs were actually causing “a problem” then there would be countless examples of humans making critical mistakes because of bad LLM responses, and that is decidedly not happening. Instead we’re just having fun ghiblifying the last 20 years of the internet.
Regardless of anything else it’s extremely too early to make such claims. We have to wait until people start allowing “AI agents” to make autonomous blackbox decision with minimal supervision since nobody has any clue what’s happening.
Even if we tone down the SciFi dystopia angle not that many people really use LMMs in non superficial ways yet. What I’m most afraid of would be the next generation growing without the ability to critically synthesize information on their own.
Most people - the vast majority of people - cannot critically synthesize information on their own.
But the implication of what you are saying is that academic rigour is going to be ditched overnight because of LLMs.
That’s a little bit odd. Has the scientific community ever thrown up its collective hands and said “ok, there are easier ways to do things now, we can take the rest of the decade off, phew what a relief!”
> what you are saying is that academic rigour is going to be ditched overnight
Not across all level and certainly not overnight. But a lot of children entering the pipeline might end up having a very different experience than anyone else before LLMs (unless they are very lucky to be in an environment that provides them better opportunities).
> cannot critically synthesize information on their own.
That’s true, but if we even less people will try to so that or even know where to start that will get even worse.
> I've even caught it literally writing the wrong algorithm when asked to implement a specific and well known algorithm
Happened to me as well. Wanted it to quickly write an algorithm for standard deviation over a stream of data, which is a text-book algorithm. It did it almost right, but messed up the final formula and the code gave wrong answers. Weird, considering some correct codes exist for that problem in Wikipedia.
I don't understand the point of that share. There are likely thousands of implementations of selection sort on the internet and so being able to recreate one isn't impressive in the slightest.
And all the models are identical in not being able to discern what is real or something it just made up.
No? I mean if they refused that would actually be a reasonably good outcome. The real problem is if they generally can write selection sorts but occasionally go haywire due to additional context and start hallucinating.
Because, to be blunt, I think this is total bullshit if you're using a decent model:
"I've even caught it literally writing the wrong algorithm when asked to implement a specific and well known algorithm. For example, asking it "write a selection sort" and watching it write a bubble sort instead. No amount of re-prompts pushes it to the right algorithm in those cases either, instead it'll regenerate the same wrong algorithm over and over."
I was part of preparing an offer a few weeks ago. The customer prepared a lot of documents for us - maybe 100 pages on total. Boss insisted on using chatgpt to summarize this stuff and read only the summary. I did a loner, slower, reading and cought on some topics chatgpt outright dropped. Our offer was based on the summary - and fell through because we missed these nuances.
But hey, boss did not read as much as previously...
I wonder if the exact phrasing has varied from the source, but even then if "consultation partners" is doing the heavy lifting there. If it was something like "useful consultation partners", I can absolutely see value as an extra opinion that is easy to override. "Oh yeah, I hadn't thought about that option - I'll look into it further."
I imagine we're talking about it as an extra resource rather than trusting it as final in a life or death decision.
> I imagine we're talking about it as an extra resource rather than trusting it
> as final in a life or death decision.
I'd like to think so. Trust is also one of those non-concrete terms that have different meanings to different people. I'd like to think that doctors use their own judgement to include the output from their trained models, I just wonder how long it is till they become the default judgement when humans get lazy.
I think that's a fair assessment on trust as a term, and incorporating via personal judgement. If this was any public story, I'd also factor in breathless reporting about new tech.
Black-box decisions I absolutely have a problem with. But an extra resource considered by people with an understanding of risks is fine by me. Like I've said in other comments, I understand what it is and isn't good at, and have a great time using ChatGPT for feedback or planning or extrapolating or brainstorming. I automatically filter out the "Good point! This is a fantastic idea..." response it inevitably starts with...
Because LLM’s, with like 20% hallucination rate, are more reliable than overworked, tired doctors that can spend only one ounce of their brainpower on the patient they’re currently helping?
In fact, the phenomenon of pseudo-intelligence scares those who were hoping to get tools that limited the original problem, as opposed to potentially boosting it.
The claim seems plausible because it doesn't say there was any formal evaluation, just that some doctors (who may or may not understand how LLMs work) hold an opinion.
> What's worse, people are treating them as authoritative.
So what? People are wrong all the time. What happens when people are wrong? Things go wrong. What happens then? People learn that the way they got their information wasn't robust enough and they'll adapt to be more careful in the future.
This is the way it has always worked. But people are "worried" about LLMs... Because they're new. Don't worry, it's just another tool in the box, people are perfectly capable of being wrong without LLMs.
Being wrong when you are building a grocery management app is one thing, being wrong when building a bridge is another.
For those sensitive use cases, it is imperative we create regulation, like every other technology that came before it, to minimize the inherent risks.
In an unrelated example, I saw someone saying recently they don't like a new version of an LLM because it no longer has "cool" conversations with them, so take that as you will from a psychological perspective.
I have a hard time taking that kind of worry seriously. In ten years, how many bridges will have collapsed because of LLMs? How many people will have died? Meanwhile, how many will have died from fentanyl or cars or air pollution or smoking. Why do people care so much about the hypothetical bad effects from new technology and so little about the things we already know are harmful
Humans bullshit and hallucinate and claim authority without citation or knowledge. They will believe all manner of things. They frequently misunderstand.
The LLM doesn’t need to be perfect. Just needs to beat a typical human.
LLM opponents aren’t wrong about the limits of LLMs. They vastly overestimate humans.
And many, many companies are proposing and implementing uses for LLM's to intentionally obscure that accountability.
If a person makes up something, innocently or maliciously, and someone believes it and ends up getting harmed, that person can have some liability for the harm.
If a LLM hallucinates something, that somone believes and they end up getting harmed, there's no accountability. And it seems that AI companies are pushing for laws & regulations that further protect them from this liability.
These models can be useful tools, but the targets these AI companies are shooting for are going to be activly harmful in an economy that insists you do something productive for the continued right to exist.
This is correct. On top of that, the failure modes of AI system are unpredictable and incomprehensible. Present day AI systems can fail on/be fooled by inputs in surprising ways that no humans would.
1. To make those harmed whole. On this, you have a good point. The desire of AI firms or those using AI to be indemnified from the harms their use of AI causes is a problem as they will harm people. But it isn't relevant to the question of whether LLMs are useful or whether they beat a human.
2. To incentivize the human to behave properly. This is moot with LLMs. There is no laziness or competing incentive for them.
That’s not a positive at all, the complete opposite. It’s not about laziness but being able to somewhat accurately estimate and balance risk/benefit ratio.
The fact that making a wrong decision would have significant costs for you and other people should have a significant influence on decision making.
That reads as "people shouldn't trust what AI tells them", which is in opposition to what companies want to use AI for.
An airline tried to blame its chatbot for inaccurate advice it gave (whether a discount could be claimed after a flight). Tribunal said no, its chatbot was not a separate legal entity.
Yeah. Where I live, we are always reminded that our conversations with insurance provider personnel over phone are recorded and can be referenced while making a claim.
Imagine a chatbot making false promises to prospective customers. Your claim gets denied, you fight it out only to learn their ToS absolves them of "AI hallucinations".
> LLM opponents aren’t wrong about the limits of LLMs. They vastly overestimate humans.
On the contrary. Humans can earn trust, learn, and can admit to being wrong or not knowing something. Further, humans are capable of independent research to figure out what it is they don't know.
My problem isn't that humans are doing similar things to LLMs, my problem is that humans can understand consequences of bullshitting at the wrong time. LLMs, on the other hand, operate purely on bullshitting. Sometimes they are right, sometimes they are wrong. But what they'll never do or tell you is "how confident am I that this answer is right". They leave the hard work of calling out the bullshit on the human.
There's a level of social trust that exists which LLMs don't follow. I can trust when my doctor says "you have a cold" that I probably have a cold. They've seen it a million times before and they are pretty good at diagnosing that problem. I can also know that doctor is probably bullshitting me if they start giving me advice for my legal problems, because it's unlikely you are going to find a doctor/lawyer.
> Just needs to beat a typical human.
My issue is we can't even measure accurately how good humans are at their jobs. You now want to trust that the metrics and benchmarks used to judge LLMs are actually good measures? So much of the LLM advocates try and pretend like you can objectively measure goodness in subjective fields by just writing some unit tests. It's literally the "Oh look, I have an oracle java certificate" or "Aws solutions architect" method of determining competence.
And so many of these tests aren't being written by experts. Perhaps the coding tests, but the legal tests? Medical tests?
The problem is LLM companies are bullshiting society on how competently they can measure LLM competence.
> On the contrary. Humans can earn trust, learn, and can admit to being wrong or not knowing something. Further, humans are capable of independent research to figure out what it is they don't know.
Some humans can, certainly. Humans as a race? Maybe, ish.
Well there are still millions that can. There is a handful of competitive LLMs and their output given the same inputs are near identical in relative terms (compared to humans).
Your second point directly contradicts your first point.
In fact we do know how good doctors and lawyers are at their jobs, and the answer is "not very." Medical negligence claims are a huge problem. Claims agains lawyers are harder to win - for obvious reasons - but there is plenty of evidence that lawyers cannot be presumed competent.
As for coding, it took a friend of mine three days to go from a cold start with zero dev experience to creating a usable PDF editor with a basic GUI for a specific small set of features she needed for ebook design.
No external help, just conversations with ChatGPT and some Googling.
Obviously LLMs have issues, but if we're now in the "Beginners can program their own custom apps" phase of the cycle, the potential is huge.
> As for coding, it took a friend of mine three days to go from a cold start with zero dev experience to creating a usable PDF editor with a basic GUI for a specific small set of features she needed for ebook design.
This is actually an interesting one - I’ve seen a case where some copy/pasted PDF saving code caused hundreds of thousands of subtly corrupted PDFs (invoices, reports, etc.) over the span of years. It was a mistake that would be very easy for an LLM to make, but I sure wouldn’t want to rely on chatgpt to fix all of those PDFs and the production code relying on them.
Well humans are not a monolithic hive mind that all behave exactly the same as an “average” lawyer, doctor etc. that provides very obvious and very significant advantages.
> days to go from a cold start with zero dev experience
>> In fact we do know how good doctors and lawyers are at their jobs, and the answer is "not very." Medical negligence claims are a huge problem. Claims agains lawyers are harder to win - for obvious reasons - but there is plenty of evidence that lawyers cannot be presumed competent.
This paragraph makes little sense. A negligence claim is based on a deviation from some reasonable standard, which is essentially a proxy for the level of care/service that most practitioners would apply in a given situation. If doctors were as regularly incompetent as you are trying to argue then the standard for negligence would be lower because the overall standard in the industry would reflect such incompetence. So the existence of negligence claims actually tells us little about how good a doctor is individually or how good doctors are as a group, just that there is a standard that their performance can be measured against.
I think most people would agree with you that medical negligence claims are a huge problem, but I think that most of those people would say the problem is that so many of these claims are frivolous rather than meritorious, resulting in doctors paying more for malpractice insurance than necessary and also resulting in doctors asking for unnecessarily burdensome additional testing with little diagnostic value so that they don’t get sued.
It's fine if it isn't perfect if whomever is spitting out answers assumes liability when the robot is wrong. But, what people want is the robot to answer questions and there to be no liability when it is well known that the robot can be wildly inaccurate sometimes. They want the illusion of value without the liability of the known deficiencies.
If LLM output is like a magic 8 ball you shake, that is not very valuable unless it is workload management for a human who will validate the fitness of the output.
I never ask a typical human for help with my work, why should that be my benchmark for using an information tool? Afaik, most people do not write about what they don't know, and if one made a habit of it, they would be found and filtered out of authoritative sources of information.
ok, but people are building determinative software _on top of them_. It's like saying "it's ok, people make mistakes, but lets build infrastructure on some brain in a vat". It's just inherently not at the point that you can make it the foundation of anything but a pet that helps you slop out code, or whatever visual or textual project you have.
It's one of those "quantities is so fascisnating, lets ignore how we got here in the first place"
You’re moving the goalposts. LLMs are masquerading as superb reference tools and as sources of expertise on all things, not as mere “typical humans.” If they were presented accurately as being about as fallible as a typical human, typical humans (users) wouldn’t be nearly as trusting or excited about using them, and they wouldn’t seem nearly as futuristic.
> I mean, I can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out.
If you're lucky it figures it out. If you aren't, it makes stuff up in a way that seems almost purposefully calculated to fool you into assuming that it's figured everything out. That's the real problem with LLM's: they fundamentally cannot be trusted because they're just a glorified autocomplete; they don't come with any inbuilt sense of when they might be getting things wrong.
I see this complaint a lot, and frankly, it just doesn't matter.
What matters is speeding up how fast I can find information. Not only will LLMs sometimes answer my obscure questions perfectly themselves, but they also help to point me to the jargon I need to use to find that information online. In many areas this has been hugely valuble to me.
Sometimes you do just have to cut your losses. I've given up on asking LLMs for help with Zig, for example. It is just too obscure a language I guess, because the hallucination rate is too high to be useful. But for webdev, Python, matplotlib, or bash help? It is invaluable to me, even though it makes mistakes every now and then.
We're talking about getting work done here, not some purity dance about how you find your information the "right way" by looking in books in libraries or something. Or wait, do you use the internet? How very impure of you. You should know, people post misinformation on there!
> Yeah but if your accountant bullshits when doing your taxes, you can sue them.
What is the point of limiting delegation to such an extreme dichotomy? As apposed to getting more things done?
The vast majority of useful things we delegate, or do for others ourselves, are not as well specified, or as legally liable for any imperfections, as an accountant doing accounting.
Let's try it this way: give me one or two prompts that you personally have had trouble with, in terms of hallucinated output and lack of awareness of potential errors or ambiguity. I have paid accounts on all the major models except Grok, and I often find it interesting to probe the boundaries where good responses give way to bad ones, and to see how they get better (or worse) between generations.
Sounds like your experiences, along with zozbot234's, are different enough from mine that they are worth repeating and understanding. I'll report back with the results I see on the current models.
I am so confused too. I hold these beliefs at the same time, and I don't feel they don't contradict each other, but apparently for many people some of these do:
- LLMs are a miraculous technology that are capable of tasks far beyond what we believed would be achievable with AI/ML in the near future. Playing with them makes me constantly feel like "this is like sci-fi, this shouldn't be possible with 2025's technology".
- LLMs are fairly clueless for many tasks that are easy enough for humans, and they are nowhere near AGI. It's also unclear whether they scale up towards that goal. They are also worse programmers than people make them to be. (At least I'm not happy with their results.)
- Achieving AGI doesn't seem impossibly unlikely any more, and doing so is likely to be an existentially disastrous event for humanity, and the worst fodder of my nightmares. (Also in the sense of an existential doomsday scenario, but even just the tought of becoming... irrelevant is depressing.)
Having one of these beliefs makes me the "AI hyper" stereotype, another makes me the "AI naysayer" stereotype and yet another makes me the "AI doomer" stereotype. So I guess I'm all of those!
> but even just the tought of becoming... irrelevant is depressing
In my opinion, there can exist no AI, person, tool, ultra-sentient omniscient being, etc. that would ever render you irrelevant. Your existence, experiences, and perception of reality are all literally irreplaceable, and (again, just my opinion) inherently meaningful. I don't think anyone's value comes from their ability to perform any particular feat to any particular degree of skill. I only say this because I had similar feelings of anxiety when considering the idea of becoming "irrelevant", and I've seen many others say similar things, but I think that fear is largely a product of misunderstanding what makes our lives meaningful.
I guess that Sabine's beef with LLM's that they are hyped as a legit "human level assistant" -kind of thing by the business people, which they clearly aren't yet. Maybe I've just managed to... manage my expectations?
That's on her then for fully believing what marketing and business execs are 'telling her' about LLMs. Does she get upset when she buys a coke around Christmas and her life doesn't become all warm and fuzzy with friendliness and cheer all around?
Seems like she's given a drill with a flathead, and just complains for months on end that it often fails (she didnt charge the drill) or gives her useless results (she uses philipheads). How about figuring out what works and what doesn't, and adjusting your use of the tool accordingly? If she is a painter, don't blame the drill for messing up her painting.
I kinda agree. But she seems smart and knowledgeable. It's kinda disappointing, like... She should know better. I guess it's the Gell-Mann amnesia effect once again.
Back when handwriting recognition was a new thing I was greatly impressed by how good it was. This was primarily because being an engineer I knew how difficult the problem is to solve. %90 recognition seemed really good to me.
When I tried to use the technology that %90 meant 1 out of every 10 things I wrote were incorrect. If it had been a keyboard I would have thrown it in the trash. That is were my Palm ended up.
People expect their technology to do things better not almost as well as a human. Waymo with LIDAR hasn't killed people. Tesla, with camera only, has done so multiple times. I will ride in a Waymo never in a Tesla self driving car.
Anyone who doesn't understand this either isn't required to use to utility it provides or has no idea how to prompt it correctly. My wife is a bookkeeper. There are some tasks that are a pain in the ass without writing some custom code. In her case, we just saved her about 2 hours by asking Claude to do it. It wrote the code, applied the code to a CSV we uploadrd and gave us exactly what we needed in 2 minutes.
>Anyone who doesn't understand this either isn't required to use to utility it provides or has no idea how to prompt it correctly.
Almost every counter-criticism of LLMs almost boil down to
1. you're holding it wrong
2. Well, I use it $DAYJOB and it works great for me! (And $DAYJOB is software engineering).
I'm glad your wife was able to save 2 hours of work, but forgive me if that doesn't translate to the trillion dollar valuation OpenAI is claiming. It's strange you don't see the inherent irony in your post. Instead of your wife just directly uploading the dataset and a prompt, she first has to prompt it to write code. There are clear limitations and it looks like LLMs are stuck at some sort of wall.
When computers/internet first came about, there were (and still are!) people who would struggle with basic tasks. Without knowing the specific task you are trying to do, its hard to judge whether its a problem with the model or you.
I would also say that prompting isn't as simple as made out to be. It is a skill in itself and requires you to be a good communicator. In fact, I would say there is a reasonable chance that even if we end up with AGI level models, a good chunk of people will not be able to use it effectively because they can't communicate requirements clearly.
So it's a natural language interface, except it can only be useful if we stick to a subset of natural language. Then we're stuck trying to reverse engineer a non documented, non deterministic API. One that will keep changing under whatever you build that uses it. That is a pretty horrid value proposition.
Short of it being able to mind read, you need to communicate with it in some way. No different from the real world where you'll have a harder time getting things done if you don't know how to effectively communicate. I imagine for a lot of popular use-cases, we'll build a simpler UX for people to click and tap before it gets sent to a model.
Boiling down to a couple cases would be more useful if you actually tried to disprove those cases or explain why they're not good enough.
> It's strange you don't see the inherent irony in your post. Instead of your wife just directly uploading the dataset and a prompt, she first has to prompt it to write code. There are clear limitations and it looks like LLMs are stuck at some sort of wall.
What's ironic about that? That's such a tiny imperfection. If that's anything near the biggest flaw then things look amazing. (Not that I think it is, but I'm not here to talk about my opinion, I'm here to talk about your irony claim.)
>Boiling down to a couple cases would be more useful if you actually tried to disprove those cases or explain why they're not good enough.
This reply is 4 comments deep into such cases, and the OP is about a well educated person who describes their difficulties.
>What's ironic about that? That's such a tiny imperfection.
I'd argue it's not tiny - it highlights the limitations of LLMs. LLMs excel at writing basic code but seem to struggle, or are untrustworthy, outside of those tasks.
Imagine generalizing his case: his wife goes to work and tells other bookkeepers "ChatGPClaudeSeek is amazing, it saved 2 hours for me". A coworker, married to a lawyer, instead of a software engineer, hearing this tries it for himself, and comes up short. Returning to work the next day and talking about his experience is told - "oh you weren't holding it right, ChatGPClaudeSeek can't do the work for you, you have to ask it to write code, that you must then run". Turns out he needs an expert to hold it properly and from the coworker's point of view he would probably need to hire an expert to help automate the task, which will likely only be marginally less expensive than it was 5 years ago.
From where I stand, things don't look amazing; at least as amazing as the fundraisers have claimed. I agree that LLMs are awesome tools - but I'm evaluating from a point of a potential future where OpenAI is worth a trillion dollars and is replacing every job. You call it a tiny imperfection, but that comes across as myopic to me - large swaths of industries can't effectively use LLMs! How is that tiny?
> Turns out he needs an expert to hold it properly and from the coworker's point of view he would probably need to hire an expert to help automate the task, which will likely only be marginally less expensive than it was 5 years ago.
The LLM wrote the code, then used the code itself, without needing a coder around. So the only negative was needing to ask it specifically to use code, right? In that case, with code being the thing it's good at, "tell the LLM to make and use code" is going to be in the basic tutorials. It doesn't need an expert. It really is about "holding it right" in a non-mocking way, the kind of instructions you expect to go through for using a new tool.
If you can go through a one hour or less training course while only half paying attention, and immediately save two hours on your first use, that's a great return on the time investment.
It's definitely a tech that's here to stay, unlike block chain/nfts
But I mirror the confusion why people are still bullish on it.
The current valuation for it is because the market thinks that it's able to write code like a senior engineer and have AGI, because that's how they're marketed by the LLM providers.
I'm not even certain if they'll be ubiquitous after the venture capital investments are gone and the service needs to actually be priced without losing money, because they're (at least currently) mostly pretty expensive to run.
There seems to be a widely held misconception that company valuations have any basis in the underlying fundamentals of what the companies do. This is not and has not been the case for several years. The US stock market’s darlings are Kardashians, they are valuable for being valuable the way the Kardashians are famous for being famous.
In markets, perception is reality, and the perception is that these companies are innovative. That’s it.
NFT is still a great tool if you want a bunch of unique tokens as part of a blockchain app. ERC-721 was proven a capable protocol in a variety of projects. What it isn't, and never will be, is an amazing investment opportunity, or a method to collect cool rare apes and go to yacht parties.
LLMs will settle in and have their place too, just not in the forefront of every investors mind.
I am more than happy to pay for access to LLMs, and models continue to get smaller and cheaper. I would be very surprised if they are not far more widely used in 5 or 10 years time than they are today.
None of that means that the current companies will be profitable or that their valuations are anywhere close to justified though. The future could easily be "Open-weight models are moderately useful for some niches, no-name cloud providers charge slightly higher than the cost of electricity to use them at low profit margins".
Dot-com boom/bubble all over again. A whole bunch of the current leaders will go bust. A new generation of companies will take over, actually focused on specific customer problems and growing out of profitable niches.
The technology is useful, for some people, in some situations. It will get more useful for more people in more situations as it improves.
Current valuations are too high (Gartner hype cycle), after they collapse valuations will be too low (again, hype cycle), then it'll settle down and the real work happens.
The existing tech giants will just hoover up all the niche LLM shops once the valuations deflate somewhat.
There's almost a negligible chance any one of these shops stays truly independent, unless propped up by a state-level actor (China/EU)
You might have some consulting/service companies that will promise to tailor big models to your specific needs, but they will be valued accordingly (nowhere near billions).
Yeah, that's probably true, the same happened after the dot-com bubble burst. From about 2005-15 if you had a vaguely promising idea and a few engineers you could get acqui-hired by a tech giant easily. The few profitable ones that refused are now middle-sized businesses doing OK (nowhere near billions).
I don't know if the survivors are going to be in consulting - there is some kind of LLM-base product capability, you could conceivably see a set of LLM-based products building companies emerge. But it'll probably be a bit different, like the mobile app boom was a bit different from the web boom.
That's been the 'endgame' of technology improvements since the industrial revolution - there are many industries that mechanized, replaced nearly their entire human workforce, and were never terribly profitable. Consider farming - in developed countries, they really did replace like 98% of the workforce with machines. For every farm that did so, so did all of their competitors, and the increased productivity caused the price of their crops to fall. Cheap food for everyone, but no windfall for farmers.
If machines can easily replace all of your workers, that means other people's machines can also replace your workers.
Yeah, the overblown hype is a feature of the hype cycle. The same was true for the web - it was going to replace retail, change the way we work and live, etc. And yes, all of that has happened, but it took 30 years and COVID to make it happen.
LLMs might lead to AGI. Eventually.
Meanwhile every company that is spruiking that, and betting their business that that's going to happen before they run out of VC funding, is going to fail.
I think it will go in the opposite direction. Very massive closed-weight models that are truly miraculous and magical. But that would be sad because of all the prompt pre-processing that will prevent you from doing much of what you'd really want to do with such an intelligent machine.
I expect it to eventually be a duopoly like android and iOS. At world scale, it might divide us in a way that politics and nationalities never did. Humans will fall into one of two AI tribes.
Except that we've seen that bigger models don't really scale in accuracy/intelligence well, just look at GPT4.5. Intelligence scales logarithmically with parameter count, the extra parameters are mostly good for baking in more knowledge so you don't need to RAG everything.
Additionally, you can use reasoning model thinking with non-reasoning models to improve output, so I wouldn't be surprised if the common pattern was routing hard queries to reasoning models to solve at a high level, then routing the solution plan to a smaller on device model for faster inference.
Exactly. If some company ever does come up with an AI that is truly miraculous and magical the very last thing they'll do is let people like you and me play with it at any price. At best, we'd get some locked down and crippled interface to heavily monitored pre-approved/censored output. My guess is that the miracle isn't going to happen.
If I'm wrong though and some digital alchemy finally manages to turn our facebook comments into a super-intelligence we'll only have a few years of an increasingly hellish dystopia before the machines do the smart thing and humanity gets what we deserve.
By the time the capital runs out, I suspect we'll be able to get open models at the level of current frontier and companies will buy a server ready to run it for internal use and reasonable pricing. It will be useful but a complete commodity.
I know folk now that are selling, basically, RAG on lammas, "in a box". Seems a bunch of mid-level at SME are ready to burn budget on hype (to me). Gotta get something deployed in the hype-cycle for quarterly bonus.
I think we can already get open-weight frontier class models today. I've run Deepseek R1 at home, and it's every bit as good as any of the ChatGPT models I can use at work.
Which companies? Google and Microsoft are only up a little over the past several years, and I doubt much of their valuation is coming from LLM hype. Most of the discussions about x.com say it's worth substantially less than some years ago.
I feel like a lot of people mean that OpenAI is burning through venture capital money. It's debatable, but it's a huge jump to go from that to thinking it's going to crash the stock market (OpenAI isn't even publicly traded).
The "Magnificent Seven" stocks (Apple, Amazon, Alphabet, Meta, Microsoft, Nvidia, and Tesla) were collectively up >60% last year and are now 30% of the entire S&P500. They are all heavily invested in AI products.
I just checked the first two, Apple and Amazon, and they're trading 28% and 23% higher than they were 3 years ago. Annualized returns from the SP 500 have been a little over 10%. Some of that comes from dividends, but Apple and Amazon give out extremely little in the way of dividends.
I'm not going to check all of the companies, but at least looking at the first two, I'm not really seeing anything out of the ordinary.
Currently, Nvidia enjoys a ton of the value capture from the LLM hype. But that's a weird state of affairs and once LLM deployments are less dependent on Nvidia hardware, the value capture will likely move to software companies. Or the LLM hype will reduce to the point that there isn't a ton of value to capture here anymore. This tech may just get commoditized.
Nvidia is trading below its historical PE from pre-AI times at this point. This is just on confirmed revenue, and its profitability keeps increasing. NVIDIA is undervalued right now
Sure, as long as it keep selling $130B worth of GPUs each year. Which is entirely predicated on the capital investment in Machine Learning attracting revenue streams that are still imaginary at this point.
> None of that means that the current companies will be profitable ... The future could easily be "Open-weight models are moderately useful for some niches, no-name cloud providers charge slightly higher than the cost of electricity to use them at low profit margins".
They just need to stay a bit ahead of the open source releases, which is basically the status quo. The leading AI firms have a lot of accumulated know-how wrt. building new models and training them, that the average "no-name cloud" vendor doesn't.
> They just need to stay a bit ahead of the open source releases, which is basically the status quo
No, OpenAI alone additionally need approximately $5B of additional cash each and every year.
I think Claude is useful. But if they charged enough money to be cashflow positive, it's not obvious enough people would think so. Let alone enough money to generate returns to their investors.
I don't doubt the first part, but how true is the second?
Is there a shortage of React apps out there that companies are desperate for?
I'm not having a go at you--this is a genuine inquiry.
How many average people are feeling like they're missing some software that they're able to prompt into existence?
I think if anything, the last few years have revealed the opposite, that there's a large/huge surplus of people in the greater software business at large that don't meet the demand when money isn't cheap.
I think anyone in the "average" range of skill looking for a job can attest to the difficulties in finding a new/any job.
I think there is plenty of demand for software but not enough economic incentive to fulfill every single demand. Even for the software that is being worked on, we are constantly prioritizing between the features we need or want, deciding whether to write our own vs modifying something open source etc etc. You can also look at stuff like electron apps which is a hack to reduce programmer dev time and time to market for cross platform apps. Ideally, you should be writing highly performant native apps for each.
IMO if coding models get good enough to replace devs, we will see an explosion of software before it flattens out.
We're several years in now, and have lots of A:B comparisons to study across orgs that allowed and prohibited AI assistants. Is one of those groups running away with massive productivity gains?
Because I don't think anybody's noticed that yet. We see layoffs that makes sense on their own after a boom, and cut across AI-friendly and -unfriendly orgs. But we don't seem to see anybody suddenly breaking out with 2x or 5x or 10x productivity gains on actual deliverables. In contrast, the enshittening just seems to be continuing as it has for years and the pace of new products and features is holding steady. No?
> We're several years in now, and have lots of A:B comparisons to study across orgs that allowed and prohibited AI assistants. Is one of those groups running away with massive productivity gains?
You mean... two years in? Where was the internet 2 years into it?
You’re not making the argument you think you’re making when you ask “Where was the [I]ntwenet 2 years into it?”
You may be intending to refer to 1971 (about two years after the creation of ARPANet) but really the more accurate comparison would be to 1995 (about two years since ISPs started offering SLIP/PPP dialup to the general public for $50/month or less).
And I think the comparison to 1995, the year of the Netscape IPO and URLs starting to appear in commercials and on packaging for consumer products, is apt: LLMs have been a research technology for a while, it’s their availability to the general public that’s new in the last couple of years. Yet while the scale of hype is comparable, the products aren’t: LLMs still don’t anything remotely like what their boosters claim, and have done nothing to justify the insane amounts of money being poured into them. With the Internet, however, there were already plenty of retailers starting to make real money doing electronic commerce by 1995, not just by providing infrastructure and related services.
It’s worth really paying attention to Ed Zitron’s arguments here: The numbers in the real world just don’t support the continued amount of investment in LLMs. They’re a perfectly fine area of advanced research but they’re not a product, much less a world-changing one, and they won’t be any time soon due to their inherent limitations.
They're not a product? Isn't Cursor on the leaderboard for fastest to $100m ARR? What about just plain usage or dependency. College kids are using chrome extensions that direct their searches to chatgpt by default. I think your connection to the internet uptake is a bit weak, and then you've ended by basically saying too much money is being thrown at this stuff, which is quite disconnected from the start of you arg.
I think it's pretty fair to say that they have close to doubled my productivity as a programmer. My girlfriend uses ChatGPT daily for her work, which is not "tech" at all. It's fair to be skeptical of exactly how far they can go but a claim like this is pretty wild.
Both your and her usage is currently being subsidized by venture capital money.
It remains to be seen how viable this casual usage actually is once this money dries up and you actually need to pay per prompt.
We'll just have to see where the pricing will eventually settle, before that we're all just speculating.
> And I think the comparison to 1995, the year of the Netscape IPO and URLs starting to appear in commercials and on packaging for consumer products, is apt
My grandfather didn’t care about these and you don’t care about LLMs, we get it
> They’re a perfectly fine area of advanced research but they’re not a product
No, it lets good engineers parallelize work. I can be adding a route to the backend while Cline with Sonnet 3.7 adds a button to the frontend. Boilerplate work that would take 20-30 minutes is handled by a coding agent. With Claude writing some of the backend routes with supervision, you've got a very efficient workflow. I do something like this daily in a 80k loc codebase.
I look forward to good standard integrations to assign a ticket to an agent and let it go through ci and up for a preview deploy & pr. I think there's lots of smaller issues that could be raised and sorted without much intervention.
Even if the VC-backed companies jacked up their prices, the models that I can run on my own laptop for "free" now are magical compared to the state of the art from 2 years ago. Ubiquity may come from everyone running these on their own hardware.
Takes like yours are just crazy given the pace of things. We can argue all day if people are "too bullish" or literally on the market size of enterprise AI, but truly, absolutely no one knows how good these things will get and the problems they'll overcome in the next 5 years. You saying "I am confused on why people are still bullish" is implicitly building in some huge assumptions about the near future.
Most “AI” companies are simply wrapping the ChatGPT API in some form. You can tell from the job posts.
They aren’t building anything themselves. I find this to be disingenuous as best, and is a sign to me of bubble attribution.
I also think that re-branding Machine Learning as AI to also be disingenuous.
These technologies of course have their use cases and excel in some things, but this isn’t the ushering of actual, sapient intelligence, that for the majority of the term’s existence was the de facto agreed standard for the term “AI”. This technology does lack the actual markers of what is generally accepted as intelligence to begin with
Remember the quote that IBM thought there would be a total market for maybe 10 or 15 computer computers in the entire world? They were giant, and expensive, and very limited in application.
A popular myth, it seems to be made-up from a way-less-interesting statement about a single specific model of computer during a 1953 stockholder meeting:
> IBM had developed a paper plan for such a machine and took this paper plan across the country to some 20 concerns that we thought could use such a machine. I would like to tell you that the [IBM 701] machine rents for between $12,000 and $18,000 a month, so it was not the type of thing that could be sold from place to place. But, as a result of our trip, on which we expected to get orders for five machines, we came home with orders for 18.
And that might have been true for a period of time. Advancements made it so they could become smaller and more efficient, and opened up a new market.
LLMs today feel like the former, but are being marketed as the latter. Fully believe that advancements will make them better, but in their current state they're being touted for their possibilities, not their actual capabilities.
I'm for using AI now as the tool they are, but AI is a while off taking senior development jobs. So when I see them being hyped for doing that it just feels like a hype bubble.
Tesla is valued based on the hope that it'll be the first to full self-driving cars. I don't think stock markets need to make sense, you invest in things that if true, could have huge growth, that's why LLM is being invested in, because alternatives will make you some ROI, but if LLM do break through major disruption in even a handful of large markets, your ROI will be huge.
That's not really true. Just the entertainment value alone is already causing OpenAI to rate limit its systems, and they're buying up significant amounts of NVIDIA's capacity, and NVIDIA itself is buying up significant portions of the entire world's chip-making budget. Even if just limited to entertainment, the value is immense, apparently.
That's a funny comparison, I can and do use cryptocurrency to pay web hosting, VPN and a few other things as it's become the native currency of the internet. I love llms too but agree with the parent comment that says it's inevitable they'll be replaced with something better well Bitcoin seems to be sticking around for the long long term.
In my office most people use chatGPT or a similar LLM every day. I don't know a single coworker that's ever used a cryptocurrency. One guy has bought some crypto stocks.
> The current valuation for it is because the market thinks that it's able to write code like a senior engineer and have AGI, because that's how they're marketed by the LLM providers.
No it's not. If it was valued for that it'd be at least 10X what it is now.
While it could be said that LLMs are in the 'peak of inflated expectations', blockchain is definitely still in the 'trough of disillusionment'. Even if there was a way for blockchain to affordably facilitate everyday transactions without destroying the planet and somehow sideloading into government acceptance, it's not clear that there would be anything novel enough to motivate people to use it vs a bank - beyond a complete collapse of the banking system.
Blockchain is here to stay, this is way past the point of "believing in the tech" - recently an wss:// order book exchange (Hyperliquid) crossed $1T volume traded, and they started in 2023.
Blockchains are becoming real-time data structures where everyone has admin level read-only access to everyone.
HN doesn't like blockchain. They had the chance to get in very early and now they're salty. I first heard about bitcoin on HN, before Silk Road made headlines.
It's more like duct-taping a VR headset to your head, calibrating your environment to a bunch of cardboard boxes and walls, and calling it a holodeck. It actually kinda works until you push at it too hard.
It reminds me a lot of when I first started playing No Man's Sky (the video game). Billions of galaxies! Exotic, one of a kind life forms on every planet! Endless possibilities! I poured hundreds of hours into the game! But, despite all the variety and possibilities, the patterns emerge, and every 'new' planet just feels like a first-person fractal viewer. Pretty, sometimes kinda nifty, but eventually very boring and repetitive. The illusion wore off, and I couldn't really enjoy it anymore.
I have played with a LOT of models over the years. They can be neat, interesting, and kinda cool at times, but the patterns of output and mistakes shatters the illusion that I'm talking to anything but a rather expensive auto-complete.
I´m in the same boat and I think it boils down to this: some people are actually quite passive, while others are more active in their use of technology.
It`d take more time for me to flesh this out than I want to give but the basic idea is I am not just sitting there "expecting things". I´ve been puzzled too at why so many people don´t seem to get it or are so frustrated like this lady, and in my observation this is their common element. It just looks very passive to me, the way they seem to use the machines and expect a result to be "given" to them.
PS. It reminds me very strongly of how our parent generation uses computers. Like the whole way of thinking is different, I cannot even understand why they would act certain ways or be afraid of acting in other ways, it´s like they use a different compass or have a very different (and wrong) model in their head of how this thing in front of them works.
> And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly?
IMO there are two distinct reasons for this:
1. You've got the Sam Altman's of the world claiming that LLMs are or nearly are AGI and that ASI is right around the corner. It's obvious this isn't true even if LLMs are still incredibly powerful and useful. But Sam doing the whole "is it AGI?" dance gets old really quick.
2. LLMs are an existential threat to basically every knowledge worker job on the planet. Peoples' natural response to threats is to become defensive.
I’m not sure how anyone can claim number 2 is true, unless it’s someone who is a programmer doing mostly grunt code and thinks every knowledge worker job is similar.
Just off the top of my head there are plenty of knowledge worker jobs where the knowledge isn’t public, nor really in written form anywhere. There just simply wouldn’t be anything for AI to train on.
> LLMs are an existential threat to basically every knowledge worker job on the planet.
Given the typical problems of LLMs they are not. You still need them to check the results. It’s like FSD, impressive when it works, bad if not, scary because you never known beforehand when it’s failing
Yeah, the vast majority of what I spend my time on in a day isn’t something an LLM can help with.
My wife and I both work on and with LLMs and they seem to be, like… 5-10% productivity boosters on a good day. I’m not sure they’re even that good averaged over a year. And they don’t seem to be getting a lot better in ways that change that. Also, they’re that good if you’re good at using them and I can tell you most people really, really are not.
I remember when it was possible to be “good at Google”. It was probably a similar productivity boost. I was good at Google. Most (like, over 95% of) people were not, and didn’t seem to be able to get there, and… also remained entirely employable despite that.
how much time do I need to devote to see anything but garbage?
For reference, I program systems code in C/C++ in a large, proprietary codebase.
My experiences with OpenAI(a year ago or more), and more recently, Cursor, Grok-v3 and Deepseek-r1, were all failures. The later two started out OK and got worse over time.
What I haven't done is asked "AI" to whip up a more standard application. I have some ideas(an ncurses frontend to p4 written in python similar to tig, for instance), but haven't gotten around to it.
I want this stuff to work, but so far it hasn't. Now I don't think "programming" a computer in english is a very good idea anyway, but I want a competent AI assistant to pair program with. To the degree that people are getting results, to me it seems they are leveraging very high-level APIs/libraries of code which are not written by AI and solving well-solved, "common" problems(simple games, simple web or phone apps). Sort of like how people gloss over the heavy lifting done by language itself when they praise the results from LLMs in other fields.
I know it eventually will work. I just don't know when. I also get annoyed by the hype of folks who think they can become software engineers because they can talk to an LLM. Most of my job isn't programming. Most of my job is thinking about what the solution should be, talking to other people like me in meetings, understanding what customers really want beyond what they are saying, and tracking what I'm doing in various forms(which is something I really do want AI to help me with).
Vibe coding is aptly named because it's sort of the VB6 of the modern era. Holy cow! I wrote a Windows GUI App!!!. It's letting non-programmers and semi-programmers(the "I write glue code in Python to munge data and API ins/outs" crowd) create usable things. Cool! So did spreadsheets. So did Hypercard. Andrej tweeting that he made a phone app was kinda cool but also kinda sad. If this is what the hundreds of billions spent on AI(and my bank account thanks you for that) delivers then the bubble is going to pop soon.
I think there is a big problem of expectations. People are told that it is great for software development, so they try to use it on big existing software projects, and it sucks.
Usually that's because of context: LLMs are not very good at understanding a very large amount of context, but if you don't give LLMs enough context, they can't magically figure it out on their own. This relegates AI to only really being useful for pretty self-contained examples where the amount of code is small, and you can provide all the context it needs to do its job in a relatively small amount of text (few thousand words or lines of code at most).
That's why I think LLMs are only useful right now in real-world software development for things like one-off functions, new prototypes, writing small scripts, or automating lots of manual changes you have to do. For example, I love using o3-mini-high to take existing tests that I have and modifying them to make a new test case. Often this involves lots of tiny changes that are annoying to write, and o3-mini-high can make those changes pretty reliably. You just give it a TODO list of changes, and it goes ahead and does it. But I'm not asking these models how to implement a new feature in our codebase.
I think this is why a lot of software developers have a bad view of AI. It's just not very good at the core software development work right now, but it's good enough at prototypes to make people freak out about how software development is going to be replaced.
That's not to mention that often when people first try to use LLMs for coding, they don't give the LLMs enough context or instructions to do well. Sometimes I will spend 2-3 minutes writing a prompt, but I often see other people putting the bare minimum effort into it, and then being surprised when it doesn't work very well.
Serious question as someone who has also tried these things out and not found them very useful in the context of working on a large, complex codebase in not python not javascript: when I imagine the amount of time it would take me to select some test cases, copy and paste them, and then think of a todo list or prompt to generate another case, even assuming the output is perfect, I feel like I’m getting close to the amount of time and mental effort it would take me to just write the test. In a way, having to ask in english for what I want in code for me adds an additional barrier: rather than just doing the thing I have to also think of a promptable description. Is that not a problem? Is it just fast enough that it doesn’t matter? What’s the deal?
I mean, for me personally, I am writing out the English TODO list while I am figuring out exactly what changes I need to make. So, the thinking and writing the prompt take up the same unit of time.
And in terms of time saved, if I am just changing string constants, it’s not going to help much. But if I’m restructuring the test to verify things in a different way, then it is helpful. For example, recently I was writing tests for the JSON output of a program, using jq. In this case, it’s pretty easy to describe the tests I want to make in English, but translating that to jq commands is annoying and a bit tedious. But o3-mini-high can do it for me from the English very well.
Annoying to do myself, but easy to describe, is the sweet spot. It is definitely not universally useful, but when it is useful it can save me 5 minutes of tedium here or there, which is quite helpful. I think for a lot of this, you just have to learn over time what works and what doesn't.
Thanks for the reply, that makes sense. jq syntax is one of those things that I’m just familiar enough with to remember what’s possible but not how to do it, so I could definitely see an LLM being useful for that.
Maybe one of my problems is that I tend to jump into writing simple code or tests without actually having the end point clearly in mind. Often that works out pretty well. When it doesn’t, I’ll take a step back and think things through. But when I’m in the midst of it, it feels like being interrupted almost, to go figure out how to say what I want in English.
Will definitely keep toying with it to see where I can find some utility.
That definitely makes a lot of sense. I think if you are coding in a flow state on something, and LLMs interrupt that, then you should avoid them for those cases.
The areas that I've found LLMs work well for are usually small simple tasks I have to do where I would end up Googling something or looking at docs anyway. LLMs have just replaced many of these types of tasks for me. But I continue to learn new areas where they work well, or exceptions where they fail. And new models make it a moving target too.
> I think if you are coding in a flow state on something, and LLMs interrupt that, then you should avoid them for those cases.
Maybe that's why I don't like them. I'm always in a flow state, or reading docs and/or a lot of code to understand something. By the time I'm typing, I already know what exactly to write, and thanks to my vim-fu (and emacs-fu), getting it done is a breeze. Then comes the edit-compile-run, or edit-test cycle, and by then it's mostly tweaks.
I get why someone would generate boilerplate, but most of the time, I don't want the complete version from the get go. Because later changes are more costly, especially if I'm not fully sure of the design. So I want something minimal that's working, then go work on things that are dependent, then get back when I'm sure of what the interface should be. I like working iteratively which then means small edits (unless refactoring). Not battling with a big dump of code for a whole day to get it working.
Yeah, I think it matters a lot what type of work you do. I have to jump between projects a lot that are all in different languages with a lot of codebases I'm not deeply familiar with. So for me, LLMs are really useful to get up-to-speed on the knowledge I need to work on new projects.
If I've got a clear idea of what I want to write, there's no way I'm touching an LLM. I'm just going to smash out the code for exactly what I need. However, often I don't get that luxury as I'll need to learn different file system APIs, different sets of commands, new jargon, different standard libraries for the new languages, new technologies, etc...
It does an ok job with C# but it’s generally outdated code I.e [required] as an annotation rather than as a keyword. Plus it generates some unnecessary constructors occasionally.
Mostly I use it for stupid templates stuff for which it isn’t bad. It’s not the second coming but it definitely speeds you up
> Most of my job is thinking about what the solution should be, talking to other people like me in meetings, understanding what customers really want beyond what they are saying, and tracking what I'm doing in various forms
None of this is particularly unique to software engineering. So if someone can already do this and add the missing component with some future LLM why shouldn’t they think they can become a software engineer?
Yeah I mean, if you can reason about, say, how an automobile engine works, then you can reason about how a modern computer works too, right? If you can discuss the tradeoffs in various engine design parameters then surely you understand amdahl's law, caching strategies of a modern CPU, execution pipelining, etc... We just need to give those auto guys an LLM and then they can do systems software engineering, right?
Did you catch the sarcasm there?
Are you a manager by any chance? The non-coding parts of my job largely require domain experience. How does an LLM provide you with that?
If your mind has trouble expanding outside the domain of "use this well known tool to do something that has already been done" then no amount of improvements will free you from your belief that chatbots are glorified autocomplete.
I hear you, I'm tired of getting people that don't care to care. It's the people that should know how cool this stuff is and don't - they frustrate me!
You people frustrate me because you don't listen that I've tried to use AI to do help with my job and it fails horribly in every way. I see that it is useful to you, and that's great, but that doesn't make it useful for everybody... I don't understand why you must have everyone agree with you, and that it's "tires" you out to hear other people's contracting opinions. It feels like a religion.
I mean, it is trivial to show that it can do things literally impossible even 5 years ago. And you don't acknowledge that fact, and that's what drives me crazy.
It's like showing someone from 1980 a modern smart phone and them saying, yeah but it can't read my mind.
I'm not trying to pick on you or anything, but at the top of the thread you said "I mean, I can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out" and now you're saying "it is trivial to show that it can do things literally impossible even 5 years ago"
This leads me to believe that the issue is not that llm skeptics refuse to see, but that you are simply unaware of what is possible without them--because that sort of fuzzy search was SOTA for information retrieval and commonplace about 15 years ago (it was one of the early accomplishments of the "big data/data science" era) long before LLMs and deepnets were the new hotness.
This is the problem I have with the current crop of AI tools: what works isn't new and what's new isn't good.
It's also a red flag to hear "it is trivial to show that it can do things literally impossible even 5 years ago" 10 comments deep without anybody doing exactly that...
Are people really this hung up on the term “AI”? Who cares? The fact that this is a shockingly useful piece of technology has nothing to do with what it’s called.
Because the AI term makes people anthropomorphize those tools.
They "hallucinate", they "know", they "think".
They're just the result of matrix calculus on which your own pattern recognition capacities fool you into thinking there is intelligence there. There isn't. They don't hallucinate, their output is wrong.
The worst example I've seen of anthropomorphism was the blog from a searcher working on adverse prompting. The tool spewing "help me" words made them think they were hurting a living organism https://www.lesswrong.com/posts/MnYnCFgT3hF6LJPwn/why-white-...
Speaking with AI proponents feels like speaking with cryptocurrencies proponents: the more you learn about how things work, the more you understand they don't and just live in lalaland.
If you lived before the invention of cars, and if when they were invented, marketers all said "these will be able to fly soon" (which of course, we know now wouldn't have been true), you would be underwhelmed? You wouldn't think it was extremely transformative technology?
From where does the premise that "artificial intelligence" is supposed to be infallible and super human come from? I think 20th century science fiction did a good job of establishing the premise that artificial intelligence will be sometimes useful but will often fail in bizarre ways that seem interesting to humans. Misunderstandings orders, applying orders literally in a way humans never would, or just flat out going haywire. Asimov's stories, HAL9000, countless others. These were the popular media tropes about artificial intelligence and the "real deal" seems to line up with them remarkably well!
When businessmen sell me "artificial intelligence", I come prepared for lots of fuckery.
Have you considered that the problems you encounter in daily life just happen to be more present in the training data than problems other users encounter?
Stitching together well-known web technologies and protocols in well-known patterns, probably a good success rate.
Solving issues in legacy codebases using proprietary technologies and protocols, and non-standard patterns. Probably not such a good success rate.
I think you would benefit from a personalized approach. If you like, send me a Loom or similar of you attempting to complete one software task with AI, that fails as you said, and I'll give you my feedback. Email in profile.
Far from just programming too. They're useful for so many things. I use it for quickly coming up with shell scripts (or even complex piped commands (or if I'm being honest even simple commands since it's easier than skimming the man page)). But I also use it to bounce ideas off of when negotiating contracts. Or to give me a spoiler-free reminder of a plot point I'd forgotten in a book or TV series. Or to explain legal or taxation issues (which I of course verify, but it points me in the right direction). Or any number of other things.
As the parent says, while far from perfect, they're an incredible aid in so many areas. When used well, they help you produce not just faster but also better results. The only trick really is that you need to treat it as a (very knowledgeable but overconfident) collaborator rather than an oracle.
I love using it to boilerplate code for a new API I want to integrate. Much better than having to manually search. In the near future, not knowing how to effectively use AI to enhance productivity will be a disadvantage to potential employers.
I use ChatGPT all the time. I really like it. It's not perfect; how I've described it (and I doubt that I'm unique in this): it's like having a really smart and eager intern at your disposal.
I say "intern" in the sense that its error-prone and kind of inexperienced, but also generally useful. I can ask it to automatically create a lot of the bootstrapping or tedious code that I always dread writing so that I can focus on the fun stuff, which is often the stuff that's pawned off onto interns and junior-level engineers. I think for the most part, when you treat it like that, it lives up to and sometimes even surpasses expectations.
I mean, I can't speak for everyone, but whenever I begin a new project, a large percentage of the first ~3 hours is simply copying and pasting and editing from documentation, either an API I have to call or some bootstrapping code from a framework or just some cruft to make built-in libraries work how you want. I hate doing all that, it actively makes me not want to start a new project. Being able to get ChatGPT to give me stuff that I need to actually get started on my project has made coding a lot more fun for me again. At this point, you can take my LLM from my cold dead hands.
I do think it will keep getting better, but I'm also at a point where even if it never improves I will still keep using it.
I guess it totally depends on the specific use cases.
Just yesterday, I asked Gemini 2.5 for how to test a Laravel artisan command that internally calls another artisan command using the Pest testing framework.
Gemini showed me a method that looked perfect.
But when I tried to execute it, I got an error that the method doesn't exist.
So I asked Gemini about this, and it told me: "Ah, apologies! That's a classic sign that you might be using a version of Laravel older than 5.7."
So I told it that I am actually using the latest release (12.3), to which it responded with: "You're right to question things when you get an error! However, let's clarify the Laravel version first.
As of today, March 27, 2025, the latest stable version of Laravel is Laravel 11, which was released in March 2024. Laravel 12 has not been released yet (it's expected roughly in Q1 2026 based on the usual schedule).
Could you please double-check the exact Laravel version you are using?"
So it did not believe me and I had to convince it first that I was using a real version. This went on for a while, with Gemini not only hallucinating stuff, but also being very persistent and difficult to convince of anything else.
Well, in the end it was still certain that this method should exist, even though it could not provide any evidence for it and my searching through the internet and the Git history of the related packages did also not provide any results.
So I gave up and tried it with Claude 3.7 which could also not provide any working solution.
In the end, I found an entirely different solution for my problem, but that wasn't based on anything the AIs told me, but just my own thinking and talking to other software developers.
I would not go that far to call these AIs useless. In software development they can help with simple stuff and boilerplate code, and I found them a lot more helpful in creative work. This is basically the opposite from what I would have expected 5 years ago ^^
But for any important tasks, these LLMs are still far too unreliable.
They often feel like they have a lot of knowledge, but no wisdom.
They don't know how to apply their knowledge ideally, and they often basically brute-force it with a mix of strange creativity and statistical models that are apparently based on a vast amount of internet content that has big parts of troll content and satire.
My issue with higher ups pushing LLMs is that what slows me down at work is not having to write the code. I can write the code. If all I had to do was sit down and write code, then I would be incredibly productive because I'm a good programmer.
But instead, my productivity is hampered by issues with org communication, structure, siloed knowledge, lack of documentation, tech debt, and stale repos.
I have for years tried to provide feedback and get leadership to do something about these issues, but they do nothing and instead ask "How have you used AI to improve your productivity?"
Ive had the same experience as you, and also rather recently. I had to learn two lessons: first, what I could trust it with (as with Wikipedia when it was new), and second, what makes sense to ask it (as with YouTube when it was new). Once I got that down, it is one fabulous tool to have on my belt, among many other tools.
Thing is, the LLMs that I use are all freeware, and they run on my gaming PC. Two to six tokens per second are alright honestly. I have enough other things to take care of in the meantime. Other tools to work with.
I don't see the billion dollar business. And even if that existed, the means of production would be firmly in the hands of the people, as long as they play video games. So, have we all tripled our salaries?
If we haven't, is that because knowledge work is a limited space that we are competing in, and LLMs are an equalizer because we all have them? Because I was taught that knowledge work was infinite. And the new tool should allow us to create more, and better, and more thoroughly. And that should get us all paid better.
Depends on your use case. If you don't need them to be the source of truth, then they work great, but if you do, the experience sucks because they're so unreliable.
The problems start when people start hyperventilating because they think since LLMs can generate tests for a function for you, that they'll be replacing engineers soon. They're only suitable for generating output that you can easily verify to be correct.
LLM training is designed to distill a massive corpus of facts, in the form of token sequences, into a much, much smaller bundle of information that encodes (somehow!) the deep structure of those facts minus their particulars.
They’re not search engines, they’re abstract pattern matchers.
I asked Grok to describe a picture I took of me and my kid at Hilton Head island. Based on the plant life it guesses it was a southeast barrier island in Georgia or the Carolinas. It guessed my age and my son’s age. LLMs are completely insane tech for a 90s kid. The first fundamental advance in tech I’ve seen in my lifetime—like what it must’ve been like for people who used a telephone for the first time, or watched a television.
Flat TVs, digital audio players (the iPod), the smartphone, laptops, the smartwatches,... You have a very selective definition of advance in tech. Compare today (minus LLMs) with any movies depicting life in the nineties and you can see how much tech have developed.
There are basically 3 categories of LLM users (very roughly).
1. People creating or dealing with imprecise information. People doing SEO spam, people dealing with SEO spam, almost all creative arts people, people writing corporatese- or legalese- documents or mails, etc. For these tasks LLMs are god-like.
2. People dealing with precise information and or facts. For these people LLMs is no better than a parrot.
3. Subset of 2 - programmers. Because of the huge amount of stolen training data, plus almost perfect proofing software is the form of compilers, static analyzers etc. for this case LLMs are more or less usable, the more data was used the better (JS is the best as I understand).
This is why people's reaction is so polarizing. Their results differ.
The crisis in programming hasn’t been writing code. It has been developing languages and tools so that we can write less of it that is easy to verify as correct. These tools generate more code. More than you can read and more than you will want to before you get bored and decide to trust the output. It is trained on the most average code available that could be sucked up and ripped off the Internet. It will regurgitate the most subtle errors that humans are not good at finding. It only saves you time if you don’t bother reading and understanding what it outputs.
I don’t want to think about the potential. It may never materialize. And much of what was promised even a few years ago hasn’t come to fruition. It’s always a few years away. Always another funding round.
Instead we have massive amounts of new demand for liquid methane, infrastructure struggling to keep up, billions of gallons of fresh water wasted, all so that rich kids can vibe code their way to easy money and realize three months later they’ve been hacked and they don’t know what to do. The context window has been lost and they ran out of API credits. Welcome to the future.
Yeah basically this. If I look at how it helps me as an individual, I can totally see how AI can sometimes be useful. If I take a look at what societal effect of AI, it becomes apparent that AI just is a net negative. Some examples:
- AI is great for disinformation
- AI is great at generating porn of women without their consent.
- Open source projects massively struggle as AI scrapers DDOS them.
- AI uses massive amounts of energy and water, most importantly the expectation is that energy usage will rise when we drastically in a world where we need to lower it. If Sam Altman gets his way, we're toast.
- AI makes us intellectually lazy and worse thinkers. We were already learning less and less in school because of our impoverished attention span. This is even worse now with AI.
- AI makes us even more dependent on cloud vendors and third-parties, further creating a fragile supply chain.
Like AI ostensibly empowers us as individuals, but in reality I think it's a disservice, and the ones it truly empowers are the tech giants, as citizens become dumber and even more dependent on them and tech giants amass more and more power.
I can't believe I had to dig this deep to find this comment.
I have yet to see an AI-generated image that was "really cool".
AI images and videos strike me as the coffee pods of the digital world -- we're just absolutely littering the internet with garbage. And as a bonus, it's also environmentally devastating to the real world!
I live nearby a landfill, and go there often to get rid of yard waste, construction materials, etc. The sheer volume of perfectly serviceable stuff people are throwing out in my relatively small city (<200k) is infuriating and depressing. I think if more people visited their local landfills, they might get a better sense for just how much stuff humans consume and dispose. I hope people are noticing just how much more full of trash the internet has become in the last few years. It seems like it, but then I read this thread full of people that are still hyped about it all and I wonder.
This isn't even to mention the generated text... it's all just so inane and I just don't get it. I've tried a few times to ask for relatively simple code and the results have been laughable.
If you ask for obscure things how do you know you are getting right answers? From my experience unless the thing you are looking for is not found easily with a google search LLMs have no hope getting it correct. This is mostly trying to code against obscure API that isn’t well documented and the little documentation there is is spread across multiple wikis. And the LLMs keep hallucinating functions that simply do not exist
It is an amazing technology and like crypto/blockchain it is nerdy to understand how it works and play with it. I think there are two things at stake here:
1. Some people are just uncomfortable with it because it “could” replace their jobs.
2. Some people are warning that the ecosystem bubble is significantly out of proportions. They are right and having the whole stock market, companies and US economy attached to LLMs is just down right irresponsible.
> Some people are just uncomfortable with it because it “could” replace their jobs.
What jobs are seriously at risk of being totally replaced by LLM's? Even in things like copywriting and natural language translation, which is somewhat of a natural "best case" for the underlying tech, their output is quite sub par compared to the average human's.
> And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly
Hossenfelder is a scientist. There's a certain level of rigour that she needs to do her job, which is where current LLMs often fall down. Arguably it's not accelerating her work to have to check every single thing the LLM says.
I use them everyday and they save me so much time and enable me to do things that I wouldn't be able to do otherwise just due to the amount of time it would take.
I think some people just aren't using them correctly or don't understand their limitations.
They are especially helpful for helping me get over thought paralysis when starting new project.
The frustration of using an LLM is greater than the frustration of doing it myself. If it's going to be a tool, it needs to work. Otherwise, it's just a research a toy.
They can do fun and interesting stuff, but we keep hearing how they’re going to replace human workers, and too many people in positions of power not only believe they are capable of this, but are taking steps to replace people with LLMs.
But while they are fun to play with, anything that requires a real answer, but can’t be directly and immediately checked, like customer support, scientific research, teaching, legal advice, identifying humans, correctly summarizing text - LLMs are very bad at these things, make up answers, mix contexts inappropriately, and more.
I’m not sure how you can have played with LLMs so much and missed this. I hope you don’t trust what they say about recipes or how to handle legal problems or how to clean things or how to treat disease or any fact-checking whatsoever.
>I’m not sure how you can have played with LLMs so much and missed this. I hope you don’t trust what they say about recipes or how to handle legal problems or how to clean things or how to treat disease or any fact-checking whatsoever.
This is like a GPT3.5 level criticism. o1-pro is probably better at pure fact retrieval than most PhDs in any given field. I challenge you to try it.
The main issue is that if you ask most LLMs to do something they aren't good at, they don't say "Sorry, I'm not sure how to do that yet," they says "Sure, absolutely! Here you go:" and proceed to make things up, provide numbers or code that don't actually add up, and make up references and sources.
To someone who doesn't actually check or have the knowledge or experience to check the output, it sounds like they've been given a real, useful answer.
When you tell the LLM that the API it tried to call doesn't exist it says "Oh, you're right, sorry about that! Here's a corrected version that should work!" and of course that one probably doesn't work either.
Yes. One of my early observations about LLMs was that we've now produced software that regularly lies to us. It seems to be a quite intractable problem. Also, since there's no real visibility as to how an LLM reaches a conclusion, there's no way to validate anything.
One takeaway from this is that labelling LLMs as "intelligent" is a total misnomer. They're more like super parrots.
For software development, there's also the problem of how up to date they are. If they could learn on the fly (or be constantly updated) that would help.
They are amazing in some ways, but they've been over-hyped tremendously.
I agree, they are an amazing piece of technology, but the investment and hype doesn't match the reality. This might age like milk, but I don't think OpenAI is going to make it. They burnt $9B to lose $5B in 2024, trying to raise money like their life depends on it.. because their life depends on it. From what I can tell, none of the AI model produces are profiting from their model usage at this point, except maybe Deepseek. This will be a market, they are useful, astonishing impressive even, but IMO they are either going to become waaayy more expensive to use and/or/combo of market/investment will greatly shrink to be sustainable.
When I saw GPT-3 in action in 2023, I couldn’t believe my eyes. I thought I was being tricked somehow. I’d seen ads for “AI-powered” services and it was always the same unimpressive stuff. Then I saw GPT-3 and within minutes I knew it was completely different. It was the first thing I’d ever seen that felt like AI.
That was only a few years ago. Now I can run something on my 8GB MacBook Air that blows GPT-3 out of the water. It’s just baffling to me when people say LLM’s are useless or unimpressive. I use them constantly and I can still hardly believe they exist!!
LLMs are better at formally verifiable tasks like coding, also coding makes more money on a pure demand basis so development for it gets more resources. In descriptive science fields, it's not great because science fields don't generate a lot of text compared to other things, so the training data is dwarfed by the huge corpus of general internet text. The software industry created the internet and loves using it, so they have published a lot more text in comparison. It can be really bad in bio for example.
Is your testing adversarial or merely anecdotal curiosity? If you don't actively look for it why would you expect to find it?
It's bad technology because it wastes a lot of labor, electricity, and bandwidth in a struggle to achieve what most human beings can with minimal effort. It's also a blatant thief of copyrighted materials.
If you want to like it, guess what, you'll find a way to like it. If you try to view it from another persons use case you might see why they don't like it.
> can ask for obscure things with subtle nuance where I misspell words and mess up my question and it figures it out. It talks to me like a person. It generates really cool images. It helps me write code. And just tons of other stuff that astounds me.
It is an impressive technology but is it US$244.22bn [1] impressive (I know this stat is supposed to account for computer vision as well but seeing as to how LLMs are now a big chunk of that I think it's a safe assumption)? It's projected to grow to over US$1tr by 2031. That's higher than the market size of commercial aviation at its peak [2]. I'm sorry if I agree that a cool chatbot is not approximately as important as flying.
You no longer have the console as the primary interface, but a GUI, which 99.9+% of computer users control via a mouse.
You no longer have the screen as the primary interface, but an AUI, which 99.9+% of computer users control via a headset, earbuds, or a microphone and speaker pair.
You mostly speak and listen to other humans, and if you're not reading something they've written, you could have it read to you in order to detach from the screen or paper.
You'll talk with your computer while in the car, while walking, or while sitting in the office.
An LLM makes the computer understand you, and it allows you to understand the computer.
Even if you use smart glasses, you'll mostly talk to the computer generating the displayed results, and it will probably also talk to you, adding information to the displayed results. It's LLMs that enable this.
Just don't focus too much on whether the LLM knows how high Mount Kilimanjaro is; its knowledge of that fact is simply a hint that it can properly handle language.
Still, it's remarkable how useful they are at analyzing things.
LLMs have a bright future ahead, or whatever technology succeeds them.
I don’t even argue that they might get useful at some point, but when I point a mouse at a button and press the button it usually results in a reliable action.
When I use the LLM (I have so far tried: Claude, ChatGPT, DeepSeek, Mistral) it does something but that something usually isn’t what I want (~the linked tweet).
Prompting, studying and understanding the result and then cleaning up the mess for the low price of an expensive monthly sub leaves me with worse results than if I did the thing myself, usually takes longer and often leaves me with subtle bugs I’m genuinely afraid of growing into exploitable vulnerabilities.
Using it strictly as a rubber duck is neat but also largely pointless.
Since other people are getting something out of the tech, I’ll just assume that the hammer doesn’t fit my nails.
These are the beginnings and it will only improve. The premise is "I genuinely don't understand why some people are still bullish about LLMs", which I just can't share.
When the mouse and GUI was invented nobody needed to say "just wait a couple years for it to improve and you'll understand why it's useful, until then please give me money". The benefits are immediately obvious and improve the experience for practically every computer user.
LLMs are very useful for some (mostly linguistic) tasks, but the areas where they're actually reliable enough to provide more value than just doing it yourself are narrow. But companies really need this tech to be profitable and so they try to make people use it for as many things as possible and shove it in everyone's face[0] in hopes that someone finds a use-case where the benefits are indeed immediately obvious and revolutionary.
[0] For example my dad's new Android phone by default opens a Gemini AI assistant when you hold the power button and it took me minutes of googling to figure out how to make it turn off the damn thing. Whoever at Google thought that this would make people like AI more is in the wrong profession.
It's like a mouse that some variable proportion of the time pretends it's moved the cursor and clicked a button, but actually it hasn't and you have to put a lot of work in to find out whether it did or didn't do what you expected.
It used to be annoying enough just having to clean the trackball, but at least you knew when it wasn't working.
I think it’s more that the people who are boosting LLMs are claiming that perfect super intelligence is right around the corner.
Personally, I look back at how many years ago it was that we were seeing claims that truck drivers were all going to lose there jobs and society would tear itself apart over it within the next few years… and yet here we still are.
I'm completely with you. The technology is absolutely fascinating in its own right.
That said, I do experience frustrations:
- Getting enraged when it messes up perfectly good code it wrote just 10 minutes ago
- Constantly reminding it we're NOT using jest to write tests
- Discovering it's created duplicate utilities in different folders
There's definitely a lot of hand-holding required, and I've encountered limitations I initially overlooked in my optimism.
But here's what makes it worthwhile: LLMs have significantly eased my imposter syndrome when it comes to coding. I feel much more confident tackling tasks that would have filled me with dread a year ago.
I honestly don't understand how everyone isn't completely blown away by how cool this technology is. I haven't felt this level of excitement about a new technology since I discovered I could build my own Flash movies.
It depends. For small tasks like summarization or self-contained code snippets, it’s really good—like figuring out how to inspect a binary executable on Linux, or designing a ranking algorithm for different search patterns. If you only want average performance or don’t care much about the details, it can produce reasonable results without much oversight.
But for larger tasks—say, around 2,000 lines of code—it often fails in a lot of small ways. It tends to generate a lot of dead code after multiple iterations, and might repeatedly fail on issues you thought were easy to fix. Mentally, it can get exhausting, and you might end up rewriting most of it yourself. I think people are just tired of how much we expect LLMs to deliver, only for them to fail us in unexpected ways. The LLM is good, but we really need to push to understand its limitations.
This is true. But it needs to be more than a toy if it is to be economically viable.
So far the industrial applications haven't been that promising, code writing and documentation is probably the most promising but even there it's not like it can replace a human or even substantially increase their productivity.
I think its perception of usefulness depends on how often you ask/google questions. If you are constantly wondering about X thing, LLMs are amazing - especially compared to previous alternatives like googling or asking on Reddit.
If you don’t constantly look for information, they might be less useful.
I'm a senior engineer with 20 years of experience and mostly find all of the AI bs for the last couple of years to be occasionally helpful for general stuff but absolutely incompetent when I need help mildly complicated tasks.
I did have a eureka moment the other day with deepseek and a very obscure bug I was trying to tackle. One api query was having a very weird, unrelated side effect. I loaded up cursor with a very extensive prompt and it actually figured out the call path I hadn't been able to track down.
Today, I had a very simple task that eventually only took me half an hour to manually track. But I started with cursor using very similar context as the first example. It just kept repeatedly dreaming up non-existent files in the PR and making suggestions to fix code that doesn't exist.
So what's the worth to my company of my very expensive time? Should I spend 10,20,50 percent of my time trying to get answers from a chatbot, or should I just use my 20 years of experience to get the job done?
I’ve been playing with Gemini 2.5 pro throwing all kinds of problems that will help me with personal productivity and it’s mostly one shoting them. I’m still in disbelief tbh.
A lot of people who don’t understand how to use LLM effectively will be at an economic disadvantage.
Can you give some examples? Do you mean things like "How do I control my crippling anxiety", things like "What highways would be best to take to Chicago", things like "Write me a Python library to parse the file format in this hex dump", or things like "What should I make for dinner"?
“The growth of the Internet will slow drastically, as the flaw in ‘Metcalfe’s law' becomes apparent: most people have nothing to say to each other! By 2005, it will become clear that the Internet’s impact on the economy has been no greater than the fax machine’s”
Same as reading books, Internet, Wikipedia, working towards/keeping your health and fitness, etc...
The quote about books being a mirror reflecting genius or idiocy seems to apply.
I see LLMs a kind of hyper-keyboard. Speeding up typing AND structuring content, completing thoughts, and inspiring ideas.
Unlike a regular keyboard, an LLM transforms input contextually. One no longer merely types but orchestrates concepts and modulates language, almost like music.
Yet mastery is key. Just as a pianist turns keystrokes into a symphony through skill, a true virtuoso wields LLMs not as a crutch but as an amplifier of thought.
As a 50+ nerd, for decades I carried the idea that can't we just build a sufficiently large neural net, throw some data at it and have somehow be usefully intelligent? So it's kind of showing strong signs of something I've been waiting for.
In the 70's I read in some science book for kids about how one day we will likely be able to use light emitting diodes for illumination instead of light bulbs, and this "cold light" will save us lots of energy. Waited out that one too; it turned out so.
I’m reminded of how I always think current cutting edge good examples of CG in movies looks so real and then, consistently, when I watch it again in 10 years it always looks distractingly shitty.
I honestly believe GP comment demonstrates a level of gullibility that AI hypesters are exploiting.
Generative LLMs are text[1] generators, statistical machines extruding plausible text.
To the extent that it a human believes it to be credible,
the output exhibits all the hallmarks of a confidence game.
Once you know the trick,
after Toto pulls back the curtain[2],
its not all that impressive.
1. I'm aware that LLMs can generate images and video as well. The point applies.
Perhaps you have already paid off your mortgage and saved up a million dollars for retirement? And you're not threatened by dismissal or salary reduction because supposedly "AI will replace everyone."
By the way, you don't need to be a 50+ year old nerd. Nerds are a special culture-pen where smart straight-A students from schools are placed so they can work, increase stakeholder revenues, and not even accidentally be able to do anything truly worthwhile that could redistribute wealth in society.
> And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly
More like we note the frequency with which these tools produce shallow bordering on useless responses, note the frequency with which they produce outright bullshit, and conclude their output should not be taken seriously. This smells like the fervor around ELIZA, but with several multinational marketing campaigns behind it pushing.
Yeah, like I. I. Rabi said in regard to people no longer being amazed by the achievements of physics, "What more do you want, mermaids?"
Anyone who remembers further back than a decade or so remembers when the height of AI research was chess programs that could beat grandmasters. Yes, LLMs aren't C3PO or the like, but they are certainly more like that than anything we could imagine just a few years ago.
The speed at which anything progresses is impressive if you're not paying attention while other people toil away on it for decades, until one day you finally look up and say, "Wow, the speed at which this thing progressed is insane!"
I remember seeing an AI lab in the late 1980's and thinking "that's never going to work" but here we are, 40 years later. It's finally working.
I'm glad I'm not the only person in awe with LLMs. It feels like it came straight out of science fiction novel. What does it take to impress people nowadays?
I feel like if teleportation was invented tomorrow, people would complain that it can't transport large objects so it's useless.
I often ask "So you say LLMs are worthless because you can't blindly trust the first thing they say? Do you blindly trust the first google search result? Do you trust every single thing your family members tell you?" It reminds me of my high school teachers saying Wikipedia can't be trusted.
Yeah the amount of "piffle work" that LLMs save me is astounding. Sure, I can look up fifty different numbers and copy them into excel. Or I can just tell an LLM "make a chart comparing measurements XYZ across devices ABC" and I'm looking at the info right there.
Probably because you don't have the same use-case as them... doing "code" is an "easy" use-case, but pondering on a humanities subject is much harder... you cannot "learn the structure" of humanities, you have to know the facts... and LLMs are bad at that
Because we're being told it is a perfect super intellegence, that it is going to replace senior engineers. The hype cycle is real, and worse than blockchain ever was. I'm sure llms will be able to code a full enterprise app about the same time moon coin replaces $usd.
I wholeheartedly agree with you and it’s funny reading the replies to your comment.
Basically people just doubling down on everything you just described. I can’t quite put a finger on it but it has a tinge of insecurity or something like that, hope that’s not the case and me just misinterpreting
It's like computer graphics and VR: Amazing advances over the years, very impressive, fun, cool, and by no means a temporary fad...
... But I do not believe we're on the cusp of a Lawnmower-Man future where someone's Metaverse eats all the retail-conference-halls and movie-theaters and retail-stores across the entire globe in an unbridled orgy of mind-shattering investor returns.
Similarly, LLMs are neat and have some sane uses, but the fervor about how we're about to invent the Omnimind and usher in the singularity and take over the (economic) world? Nah.
Today's models are far from autonomous thinking machines. It is a cognitive bias among the masses that agree. It is just a giant calculator. It predicts "the most probable next word" from a sea of all combinations of next words.
I don't see it as a bigger leap than the internet itself. I recall needing books on my desk or a road trip to the local bookshop to find out coding answers. Stack Overflow beats AI most days, but the LLMs are another nice tool.
For exploring topics in a shallow fashion is fine with LLMs, doing anything deep is just too unreliable due to hallucination. All models I’ve talked to desperately want to give a positive answer, and thus will often just lie.
Indeed, it is the stuff of science fiction, and the you get an "akshually, it's just statistics" comment. I feel people projecting their fears, because deep down, they're simply afraid.
I like LLMs for what they are. Classifiers. I don’t trust them as search engines because of hallucinations. I use them to get a bearing on a subject but then I’ll turn to Google to do the real research.
I go back and forth. I share your amazement. I used Gemini Deep Research the other day and was blown away. It claimed to go read 20 websites, I showed its "thinking" and steps. Its conclusions at each step. Then it wrote a large summary (several pages)
On the other hand, I saw github recently added Copilot as a code reviewer. For fun I let it review my latest pull request. I hated its suggestions but could imagine a not too distant future where I'm required by upper management to satisfy the LLM before I'm allowed to commit. Similarly, I've asked ChatGPT questions and it's been programmed to only give answers that Silicon Valley workers have declared "correct".
The thing I always find frustrating about the naysayers is that they seem to think how it works today is the end if it. Like I recently listened to an episode of Econtalk interviewing someone on AI and education. See lives in the UK and used Tesla FSD as an example of how bad AI is. Yet I live in California and see Waymo mostly working today and lots of people using it. I believe she wouldn't have used the Tesla FSD example, and would possibly have changed her world view at least a little, if she'd updated on seeing self driving work.
What you're impressed with is 40% human skill in creating an LLM, 0.5% value created by the model. And 59.5% the skills of all the people it ate and is now trying to destroy the livelihood of
As others have pointed out already, the hype about writing code like senior engineer, or in general acting as a competent assistant is what created the expectation in the first place. They keep over-promising, but underdelivering. Who is the guy talking about AGI most of the time? Could it be the top-executive of one of the largest gen AI companies, do you think? I won't deny it has occasionally a certain 'star-trek-computer' flair to it, but most of the time it feels like having a heavily degraded version of "rain man". He may count your cards perfectly one moment, then will get stuck trying to untie his shoes. I stopped counting how many times it produced just outright wrong outputs, to the point of suggesting literally the opposite of what one is asking of them. I would not mind it so much, if they were being advertised for what they are, not for what they could potentially be, if only another half a trillion dollar were invested in data-centers. It is not going to happen with this technology, the issue is structural, not resource-related.
Really? I just get garbage. Both Claude and CoPilot kept insisting that it was ok to use react hooks outside of function components. There have been many other situations where it gave me some code and even after refining the prompt it just gave me wrong or non working code. I’m not expecting perfection, but at least don’t waste my time with hallucinations or just flat out stuff that doesn’t work.
> And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"
Except this isn't true. The code quality varies dramatically depending on what you're doing, the length of the chat/context, etc. It's an incredible productivity booster, but even earlier today, I wasted time debugging hallucinated code because the LLM mixed up methods in a library.
The problem isn't so much that it's not an amazing technology, it's how it's being sold. The people who stand to benefit are speaking as though they've invented a god and are scaring the crap out of people making them think everyone will be techno-serfs in a few years. That's incredibly careless, especially when as a technical person, you understand how the underlying system works and know, definitively, that these things aren't "intelligent" the way they're being sold.
Like the startups of the 2010s, everyone is rushing, lying, and huffing hopium deluding themselves that we're minutes away from the singularity.
You forget the large group of people that proudly declare they invent AGI and they can make everyone lose jobs and starve. complains are for them, not for you.
Keep in mind it understands nothing. The notion that LLMs understand anything is fundamentally flawed, as they do not demonstrate any markers of understanding
The fact that you don't know what Markov chain means and get angry over others over that pisses me off.
Both are Markov chains, that you used to erroneously think Markov chain is a way to make a chatbot rather than a general mathematical process is on you not them.
not one of them have managed to generate a successful promise based implementation of recaptcha v2 in javascript from scratch https://developers.google.com/recaptcha/docs/loading they have a million+ references for this
Because the marketers oversold it. That is why you are seeing a pushback. I also outright rejected them because 1) they were sold and marketed as end all be all replacements for human thought, and 2) they promised to replace only the parts of my job that I enjoy. Billboards were up in San Francisco telling my "bosses" that I was soon to be replaced, and the loudest and earliest voices told me that the craft I love is dead. Imagine Nascar drivers excitedly discussing how cool it was they wouldn't have to turn left anymore - made me wonder why everyone else was here.
It was, more or less, the same narrative arc as Bitcoin, and was (is) headed for a crash.
That said, I've spent a few weeks with augment, and it is revelatory, certainly. All the marketing - aimed at a suite I have no interest in - managed to convince me it was something it wasn't. It isn't a replacement, any more than a power drill is a replacement for a carpenter.
What it is, is very helpful. "The world's most fully functioning scaffolding script", an upgrade from copilot's "the world's most fully functioning tab-completer". I appreciate it usefulness as a force multiplier, but I am already finding corners and places where I'd just prefer to do it myself. And this is before we get into the craft of it all - I am not excited by the pitch "worse code, faster", but the utility is undeniable in this capitalistic hell planet, and I'm not a huge fan of writing SQL queries anyway, so here we are!
For me, LLMs are a bit like if you were shown a talking dog with the education and knowledge of a first grad student: a talking dog is amazing in itself, and a truly impressive technical feat, that said you wouldn't make the dog file your taxes or represent you in court.
To quote Joel Spolsky, "When you’re working on a really, really good team with great programmers, everybody else’s code, frankly, is bug-infested garbage, and nobody else knows how to ship on time.", and that's the state we end up if we believe in the hype and use LLMs willy-nilly.
That's why people are annoyed, not because LLMs cannot code like a senior engineer, but because lots of content marketing a company valuation is dependent on making people believe it's the case.
I mean. How would you feel if you coded a menu in Python with certain choices but when you used it the choices were never the same or in the same order, sometimes there were fake choices, sometimes they are improperly labelled and sometimes the menu just completely fails to open. And you as a coder and you as a user have absolutely no control over any of those issues. Then, when you go online to complain people say useful stuff like "Isn't it amazing that it does anything at all!? Give us a break, we're working on it bro."
That's how I see LLMs and the hype surrounding them.
a lot of it is just plain denial. a certain subgenre of person will forever attack everything AI does because they feel threatened by it and a certain other subgenre of person will just copy this behaviour and parrot opinions for upvotes/likes/retweets.
I'll keep bringing up this example whenever people dismiss LLMs.
I can ask Claude the most inane programming question and got an answer. If I were to do that on StackOverflow, I'd get downvoted, rude comments, and my question closed for being off-topic. I don't have to be super knowledgeable about the thing I'm asking about with Claude (or any LLM for that matter).
Even if you ignore the rudeness and elitism of power-users of certain platforms, there's no more waiting for someone to respond to your esoteric questions. Even if the LLM spews bullshit, you can ask it clarifying questions or rephrase until you see something that makes sense.
I love LLMs, I don't care what people say. Even when I'm just spitballing ideas[1], the output is great.
For me, I think they're valuable but also overhyped. They're not at the point they're replacing entire dev teams like some articles point out. In addition, they are amazingly accurate sometimes and amazingly misleading other times. I've noticed some ardent advocates ignore the latter.
It's incredibly frustrating when people think they're a miracle tool and blindly copy/paste output without doing any kind of verification. This is especially frustrating when someone who's supposed to be a professional in the field is doing it (copy lasting non working AI generated code and putting it up for review)
That said, on one hand, they multiply productive and useful information. On the other hand, they kill productive and spread misinformation. That said, I still seem them as useful but not a miracle
I blame overpromised expectations from startups and public companies, screaming about AGI and superintelligence.
Truly amazing technology which is very good at generating and correcting texts is marketed as senior developer, talented artist, and black box that has solution to all your problems. This impression shatters on the first blatant mistake, e.g. counting elephant legs: https://news.ycombinator.com/item?id=38766512
It's the classic HN-like anti-anything bubble we see with Javascript frameworks. Hundreds of thousands of people are productive with them and enjoy them. They created entire industries and job fields. The same is happening with LLMs, but the usual counter-culture dev crowd is denying it while it's happening right before their eyes. I too use LLMs every day. I never click and a link and it doesn't exist. When I want to take my mind off of things, I just talk with GPT.
You're being disingenuous. The tweet was talking about asserting the existence of fake articles, claiming that a paper was written in one year while summarizing a paper that explicitly says it was written in another, and severe hallucinations. Nowhere does she even imply that she's looking for superintelligence.
What I find interesting is that my experience has been 100% the opposite. I’ve been using ChatGPT, Claude, and Gemini for almost a year (well only the ChatGPT for a year since the rest are more recent.) I’ve been using them to help build circuits and write code. They are almost always wrong with circuit design, and create code that doesn’t work north of 80% of the time. My patience has dropped off to the point where I only experiment with LLM a few times a week because they are so bad. Yes it is miraculous that we can have a conversation, but it means nothing if the output is always wrong.
But I will admit the dora muckbang feet shit is fucking insane. And that just flat out scares the pants off me.
>They are almost always wrong with circuit design, and create code that doesn’t work north of 80% of the time.
Sorry but this is a total skill issue lol. 80% code failure rate is just total nonsense. I don't think 1% of the code I've gotten from LLMs has failed to execute correctly.
LLMs can't be trusted. They aé like an overconfident idiot who is pretending quite impressively, butif you check on the result it's just a bit too much bullshit in the result. So there's practically zero gain in using LLMs except WHEN you actually need a text that's nice and eloquent bullshit.
Almost everytime I've tried using LLMs I've fallen into thepattern on calling out, correcting and argueing with the LLMs which is of course in itself sillyto do, because they don't learn, they don't really "get it" when they are wrong. There's no benefit to talking to a human.
This is the place where tech shiny meets actual use cases, and users aren’t really good at articulating their problems.
Its also a slow burn issue - you have to use it for a while for what is obvious to users, to become obvious to people who are tech first.
The primary issue is the hype and forecasted capabilities vs actual use cases. People want something they can trust as much as an authority, not as much as a consultant.
If I were to put it in a single sentence? These are primarily narrative tools, being sold as factual /scientific tools.
When this is pointed out, the conversation often shifts to “well people aren’t that great either”. This takes us back to how these tools are positioned and sold. They are being touted as replacements to people in the future. When this claim is pressed, we get to the start of this conversation.
Frankly, people on HN aren’t pessimistic enough about what is coming down the pipe. I’ve started looking at how to work in 0 Truth scenarios, not even 0 trust. This is a view held by everyone I have spoken to in fraud, misinformation, online safety.
There’s a recent paper which showed that GAI tools improved the profitability of Phishing attempts by something like 50x in some categories, and made previously loss making (in $/hour terms) targets, profitable. Schneier was one of the authors.
A few days ago I found out someone I know who works in finance, had been deepfaked and their voice/image used to hawk stock tips. People were coming to their office to sue them.
I love tech, but this is the dystopia part of cyberpunk being built. These are narrative tools, good enough to make people think they are experts..
The thing LLMs are really really good at, is sounding authoritative.
If you ask it random things the output looks amazing, yes. At least at first glance. That's what they do. It's indeed magical, a true marvel that should make you go: Woooow, this is amazing tech: Coming across as convincing, even if based on hallucinations, is in itself a neat trick!
But is it actually useful? The things they come up with are untrustworthy and on the whole far less good than previously available systems. In many ways, insidiously worse: It's much harder to identify bad information than it was before.
It's almost like we designed a system to pass turing tests with flying colours but forgetting that usefulness is what we actually wanted, not authoritative, human sounding bullshit.
I don't think the LLM naysayers are 'unimpressed', or that they demand perfection. I think they are trying to make statements aimed at balancing things:
Both the LLMs themselves, and the humans parroting the hype, are severely overstating the quality of what such systems produce. Hence, and this is a natural phenomenon you can observe in all walks of life, the more skeptical folks tend to swing the pendulum the other way, and thus it may come across to you as them being overly skeptical instead.
I totally agree, and this community far is from the worst. In trans communities there's incredible hostility towards LLMs - even local ones. "You're ripping off artists", "A pissing contest for tech bros", etc.
I'm trans, and I don't disagree that this technology has aspects that are problematic. But for me at least, LLMs have been a massive equalizer in the context of a highly contentious divorce where the reality is that my lawyer will not move a finger to defend me. And he's lawyer #5 - the others were some combination of worse, less empathetic, and more expensive. I have to follow up a query several times to get a minimally helpful answer - it feels like constant friction.
ChatGPT was a total game-changer for me. I told it my ex was using our children to create pressure - feeding it snippets of chat transcripts. ChatGPT suggested this might be indicative of coercive control abuse. It sounded very relevant (my ex even admitted in a rare, candid moment that she feels a need to control everyone around her one time), so I googled the term - essentially all the components were there except physical violence (with two notable exceptions).
Once I figured that out, I asked it to tell me about laws related to controlling relationships - and it suggested laws either directly addressing (in the UK and Australia), and the closest laws in Germany (Nötigung, Nachstellung, violations of dignity, etc., translating them to English - my best language). Once you name specific laws broken and provide a rationale for why there's a Tatbestand (ie the criterion for a violation is fulfilled), your lawyer has no option but to take you more seriously. Otherwise he could face a malpractice suit.
Sadly, even after naming specific law violations and pointing to email and chat evidence, my lawyer persists in dragging his feet - so much so that the last legal letter he sent wasn't drafted by him - it was ChatGPT. I told my lawyer: read, correct, and send to X. All he did was to delete a paragraph and alter one or two words. And the letter worked.
Without ChatGPT, I would be even more helpless and screwed than I am. It's far from clear I will get justice in a German court, but at least ChatGPT gives me hope, a legal strategy. Lastly - and this is a godsend for a victim of coercive control - it doesn't degrade you. Lawyers do. It completely changed the dynamics of my divorce (4 years - still no end in sight, lost my custody rights, then visitation rights, was subjected to confrontational and gaslighting tactics by around a dozen social workers - my ex is a social worker -, and then I literally lost my hair: telogen effluvium, tinea capitis, alopecia areata... if it's stress-related, I've had it), it gave me confidence when confronting my father and brother about their family violence.
It's been the ONLY reliable help, frankly, so much so I'm crying as I write this. For minorities that face discrimination, ChatGPT is literally a lifeline - and that's more true the more vulnerable you are.
I agree. I recently asked if a certain GPU would fit in a certain computer... And it understood that fit could mean physically inside by could also mean that the interface is compatible, and answered both.
It did. It mentioned PCIe connectors, what connects to what, and said this computer has motherboard with such and such PCIe, the card needs such and such, so it's compatible. Regarding physical size it, it said that it depends on the physical size of the case (implying that it understood that the size of the card is known but the size of the computer isn't know to it)
It's quite insulting that you just assume I don't know how to read specs. You're either assuming based on nothing, or you're inferring from my comment in which case I worry for your reading comprehension. At no point did I say I didn't know how to find the answer or indeed that I didn't know the answer.
TBH, they produce trash results for almost any question I might want to ask them. This is consistently the case. I must use them differently than other people.
LLMs produce midwit answers. If you are an expert in your domain, the results are kind of what you would expect for someone who isn’t an expert. That is occasionally useful but if I wanted a mediocre solution in software I’d use the average library. No LLM I have ever used has delivered an expert answer in software. And that is where all the value is.
I worked in AI for a long time, I like the idea. But LLMs are seemingly incapable of replacing anything of value currently.
The elephant in the room is that there is no training data for the valuable skills. If you have to rely on training data to be useful, LLMs will be of limited use.
Here’s when we can start getting excited about LLMs: when they start making new and valid scientific discoveries that can radically change our world.
When an AI can say “Here’s how you make better, smaller, more powerful batteries, follow these plans”, then we will have a reason to worship AI.
When AI can bring us wonders like room temperature semiconductors, fast interstellar travel, anti-gravity tech, solutions to world hunger and energy consumption, then it will have fulfilled the promise of what AI could do for humanity.
Until then, LLMs are just fancy search and natural language processors. Puppets with strings. It’s about as impressive as Google was when it first came out.
And people just sit around, unimpressed, and complain that ... what ... it isn't a perfect superintelligence that understands everything perfectly? This is the most amazing technology I've experienced as a 50+ year old nerd that has been sitting deep in tech for basically my whole life. This is the stuff of science fiction, and while there totally are limitations, the speed at which it is progressing is insane. And people are like, "Wah, it can't write code like a Senior engineer with 20 years of experience!"
Crazy.