I get so confused on this. I play around, test, and mess with LLMs all the time ...

WhyOhWhyQ · 2025-03-28T11:23:23 1743161003

The technology is not just less than superintelligence, for many applications it is less than prior forms of intelligence like traditional search and Stack Exchange, which were easily accessible 3 years ago and are in the process of being displaced by LLMs. I find that outcome unimpressive.

And this Tweeter's complaints do not sound like a demand for superintelligence. They sound like a demand for something far more basic than the hype has been promising for years now. - "They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error." (Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.) - "They reference a scientific publication, I look it up, it doesn't exist." - "I have tried Gemini, and actually it was even worse in that it frequently refuses to even search for a source and instead gives me instructions for how to do it myself." - "I also use them for quick estimates for orders of magnitude and they get them wrong all the time. " - "Yesterday I uploaded a paper to GPT to ask it to write a summary and it told me the paper is from 2023, when the header of the PDF clearly says it's from 2025. "

Thlom · 2025-03-28T11:55:57 1743162957

A municipality in Norway used LLM to create a report about the school structure in the municipality (how many schools are there, how many should there be, where should they be, how big should they be, pros and cons of different size schools and classes etc etc). Turns out the LLM invented scientific papers to use as references and the whole report is complete and utter garbage based on hallucinations.

brookst · 2025-03-28T12:15:33 1743164133

And that says… what? The entire LLM technology is worthless for all applications, from all implementations?

A company I worked for spent millions on a customer service solution that never worked. I wouldn’t say that contracted software is useless.

icepat · 2025-03-28T13:24:04 1743168244

I agree. I use LLMs heavily for gruntwork development tasks (porting shell scripts to Ansible is an example of something I just applied them to). For these purposes, it works well. LLMs excel in situations where you need repetitive, simple adjustments on a large scale. IE: swap every postgres insert query, with the corresponding mysql insert query.

A lot of the "LLMs are worthless" talk I see tends to follow this pattern:

1. Someone gets an idea, like feeding papers into an LLM, and asks it to do something beyond its scope and proper use-case.

2. The LLM, predictably, fails.

3. Users declare not that they misused the tool, but that the tool itself is fundamentally corrupted.

It in my mind is no different to the steam roller being invented, and people remaking how well it flattens asphalt. Then a vocal group trying to use this flattening device to iron clothing in bulk, and declaring steamrollers useless when it fails at this task.

replyifuagree · 2025-03-28T14:15:31 1743171331

>swap every postgres insert query, with the corresponding mysql insert query.

If the data and relationships in those insert queries matter, at some unknown future date you may find yourself cursing your choice to use an LLM for this task. On the other hand you might not ever find out and just experience a faint sense of unease as to why your customers have quietly dropped your product.

babyent · 2025-03-28T15:43:20 1743176600

I hope people do this and royally mess shit up.

Maybe then they’ll snap out of it.

I’ve already seen people completely mess things up. It’s hilarious. Someone who thinks they’re in “founder mode” and a “software engineer” because chatgpt or their cursor vomited out 800 lines of python code.

brookst · 2025-03-28T17:33:46 1743183226

The vileness of hoping people suffer aside, anyone who doesn’t have adequate testing in place is going to fail regardless of whether bad code is written by LLMs or Real True Super Developers.

babyent · 2025-03-28T18:27:19 1743186439

What vileness? These are people who are gleefully sidestepping things they don't understand and putting tech debt onto others.

I'd say maybe up to 5-10 years ago, there was an attitude of learning something to gain mastery of it.

Today, it seems like people want to skip levels which eventually leads to catastrophic failure. Might as well accelerate it so we can all collectively snap out of it.

icepat · 2025-03-28T17:42:02 1743183722

The mentality you're replying to confuses me. Yes, people can mess things up pretty badly with AI. But I genuinely don't understand why the assumption that anyone using AI is also not doing basic testing, or code review.

actinium226 · 2025-03-28T15:26:24 1743175584

Probably better to have AI help you write a script to translate postgres statements to mysql

icepat · 2025-03-28T17:30:55 1743183055

Right, which is why you go back and validate code. I'm not sure why the automatic assumption that implementing AI in a workflow means you blindly accept the outputs. You run the tool, you validate the output, and you correct the output. This has been the process with every new engineering tool. I'm not sure why people assume first that AI is different, and second that people who use it are all operating like the lowest common denominator AI slop-shop.

snackernews · 2025-03-29T01:11:31 1743210691

In this analogy are all the steamroller manufacturers loudly proclaiming how well it 10x the process of bulk ironing clothes?

And is a credulous executive class en masse buying into that steam roller industry marketing and the demos of a cadre of influencer vibe ironers who’ve never had to think about the longer term impacts of steam rolling clothes?

freedomben · 2025-03-28T16:47:23 1743180443

> porting shell scripts to Ansible

Thank you for mentioning that! What a great example of something an LLM can pretty well do that otherwise can take a lot of time looking up Ansible docs to figure out the best way to do things. I'm guessing the outputs aren't as good as someone real familiar with Ansible could do, but it's a great place to start! It's such a good idea that it seems obvious in hindsight now :-)

icepat · 2025-03-28T17:34:04 1743183244

Exactly, yeah. And once you look over the Ansible, it's a good place to start and expand. I'll often have it emit hemlcharts for me as templates, then after the tedious setup of the helm chart is done, the rest of it is me manually doing the complex parts, and customizing in depth.

fragmede · 2025-03-28T17:47:34 1743184054

Plus, it's a generic question; "give a helm chart for velero that does x y and z" is as proprietary as me doing a Google search for the same, so you're not giving proprietary source code to OpenAI/wherever so that's one fewer thing to worry about.

icepat · 2025-03-28T17:54:55 1743184495

Yeah, I tend to agree. The main reason that I use AI for this sort of stuff is it also gives me something complete that I can then ask questions about, and refine myself. Rather than the fragmented documentation style "this specific line does this" without putting it in the context of the whole picture of a completed sample.

I'm not sure if it's a facet of my ADHD, or mild dyslexia, but I find reading documentation very hard. It's actually a wonder I've managed to learn as much as I have, given how hard it is for me to parse large amounts of text on a screen.

Having the ability to interact with a conversational type documentation system, then bullshit check it against the docs after is a game changer for me.

fragmede · 2025-03-28T18:01:07 1743184867

that's another thing! people are all "just read the documentation". the documentation goes on and on about irrelevant details, how do people not see the difference between "do x with library" -> "code that does x", and having to read a bunch of documentation to make a snippet of code that does the same x?

icepat · 2025-03-28T18:06:21 1743185181

I'm not sure I follow what you mean, but in general yes. I do find "just read the docs" to be a way to excuse not helping team members. Often docs are not great, and tribal knowledge is needed. If you're in a situation where you're either working on your own and have no access to that, or in a situation where you're limited by the team member's willingness to share, then AI is an OK alternative within limits.

Then there's also the issue that examples in documentation are often very contrived, and sometimes more confusing. So there's value in "work up this to do such and such an operation" sometimes. Then you can interrogate the functionality better.

nancyminusone · 2025-03-28T14:30:16 1743172216

No, it says that people dislike liars. If you are known for making up things constantly, you might have a harder time gaining trust, even if you're right this time.

sswatson · 2025-03-28T14:43:20 1743173000

All of these things can be true at the same time:

1. LLMs have been massively overhyped, including by some of the major players.

2. LLMs have significant problems and limitations.

3. LLMs can do some incredibly impressive things and can be profoundly useful for some applications.

I would go so far as to say that #2 and #3 are hardly even debatable at this point. Everyone acknowledges #2, and the only people I see denying #3 are people who either haven't investigated or are so annoyed by #1 that they're willing to sacrifice their credibility as an intellectually honest observer.

absolutelastone · 2025-03-28T15:56:21 1743177381

#3 can be true and yet not be enough to make your case. Many failed technologies achieved impressive engineering milestones. Even the harshest critic could probably brainstorm some niche applications for a hallucination machine or whatever.

fragmede · 2025-03-28T18:11:15 1743185475

And yet we keep electing them to public office.

mywittyname · 2025-03-28T21:24:27 1743197067

It says that people need training on what the appropriate use-cases for LLMs are.

This is not the type of report I'd use an LLM to generate. I'd use a database or spreadsheet.

Blindly using and trusting LLMs is a massive minefield that users really don't take seriously. These mistakes are amusing, but eventually someone is going to use an LLM for something important and hallucinations are going to be deadly. Imagine a pilot or pharmacist using an LLM to make decisions.

Some information needs to come from authoritative sources in an unmodified format.

knowitnone · 2025-03-28T15:03:49 1743174229

If it makes data up, then it is worthless for all implementations. I'd rather it said I don't have info on this question.

anamexis · 2025-03-28T15:14:20 1743174860

It only makes it worthless for implementations where you require data. There's a universe of LLM use cases that aren't asking ChatGPT to write a report or using it as a Google replacement.

timacles · 2025-03-28T16:01:10 1743177670

The problem is that yes llms are great when working on some regular thing for the first time. You can get started at a speed never before seen in the tech world.

But as soon as your use case goes beyond that LLMs are almost useless.

The main complaint that yes its extremely helpful in that specific subset of problems, it’s not actually pushing human knowledge forward. Nothing novel is being created with it.

It has created this illusion of being extremely helpful when in reality it is a shallow kind of help.

tasuki · 2025-03-28T20:30:21 1743193821

> If it makes data up, then it is worthless for all implementations.

Not true. It's only worthless for the things you can't easily verify. If you have a test for a function and ask an LLM to generate the function, it's very easy to say whether it succeeded or not.

In some cases, just being able to generate the function with the right types will mostly mean the LLM's solution is correct. Want a `List(Maybe a) -> Maybe(List(a))`? There's a very good chance a LLM will either write the right function or fail the type check.

joquarky · 2025-03-28T16:13:06 1743178386

> all implementations

Are you speaking for yourself or everyone?

brookst · 2025-03-28T17:34:19 1743183259

Does “it” apply to Homo sapiens as well?

px1999 · 2025-03-28T22:34:28 1743201268

Except value isnt polarised like that.

In a research context, it provides pointers, and keywords for further investigation. In a report-writing context it provides textual content.

Neither of these or the thousand other uses are worthless. Its when you expect working and complete work product that it's (subjectively, maybe) worthless but frankly aiming for that with current gen technology is a fool's errand.

dmichulke · 2025-03-28T13:33:15 1743168795

It says we don't have a lower bound on the effectiveness.

It's (currently) like an ad saying "this product can improve your stuff up to 300%"

rzwitserloot · 2025-03-28T13:30:04 1743168604

It mostly says that one of the seriously difficult challenges with LLMs is a meta-challenge:

* LLMs are dangerously useless for certain domains.

* ... but can be quite useful for others.

* The real problem is: They make it real tricky to tell, because most of all they are trained to sound professional and authoritative. They hallucinate papers because that's what authoritative answers look like.

That already means I think LLMs are far less useful than they appear to be. It doesn't matter how amazing a technology is: If it has failure modes and it is very difficult to know what they are, it's dangerous technology no matter how awesome it is when it is working well. It's far simpler to deal with tech that has failure modes but you know about them / once things start failing it's easy to notice.

Add to it the incessant hype, and, oh boy. I am not at all surprised that LLMs have a ridiculously wide range as to detractors/supporters. Supporters of it hype the everloving fuck out of it, and that hype can easily seem justified due to how LLMs can produce conversational, authoritative sounding answers that are explicitly designed to make your human brain go: Wow, this is a great answer!

... but experts read it and can see the problems there. Which lots of tech suffers from: as a random example: Plenty of highly upvoted apparently fantastically written Stack Overflow answers have problems. For example, it's a great answer... for 10 years ago; it is a bad idea today because the answer has been obsoleted.

But between the fact that it's overhyped and particularly complex to determine an LLM answer is hallucinated drivel, it's logical to me that experts are hyperbolic when highlighting the problems. That's a natural reaction when you have a thing that SEEMS amazing but actually isn't.

fragmede · 2025-03-28T18:29:48 1743186588

> Stack Overflow answers have problems. For example, it's a great answer... for 10 years ago

To be fair, that's a huge problem with stack overflow and its culture. A better version of stack overflow wouldn't have that particular issue.

svrtknst · 2025-03-28T13:07:32 1743167252

You, and the OP, are being unfair in your replies. Obviously, it's not worthless for all applications but when LLMs obviously fail in disastrous ways in some important areas, you can't refute that by going "actually it gives me codign advice and generates images".

Thats nice and impressive, but there are still important issues and shortcomings. Obligatory, semirelated xkcd: https://xkcd.com/937/

michaelcampbell · 2025-03-29T17:05:22 1743267922

> And that says… what? The entire LLM technology is worthless for all applications, from all implementations?

You're the first in the thread to have brought that up; there are far more charitable ways to have interpreted the post you're replying to.

camillomiller · 2025-03-28T20:17:56 1743193076

That software just didn‘t work that way. I don’t think it tried to convince the users that they were wrong by spouting nonsense that seems legitimate.

mwigdahl · 2025-03-28T13:42:56 1743169376

All of these anecdotal stories about "LLM" failures need to go into more detail about what model, prompt, and scaffolding was used. It makes a huge difference. Were they using Deep Research, which searches for relevant articles and brings facts from them into the report? Or did they type a few sentences into ChatGPT Free and blindly take it on faith?

LLMs are _tools_, not oracles. They require thought and skill to use, and not every LLM is fungible with every other one, just like flathead, Phillips, and hex-head screwdrivers aren't freely interchangeable.

spamizbad · 2025-03-28T13:49:07 1743169747

If any non-trivial ask of an LLM also requires the prompts/scaffolding to be listed, and independently verified, along with its output, their utility is severely diminished. They should be saving time not giving us extra homework.

Far better to just get these problems resolved.

mwigdahl · 2025-03-28T14:17:54 1743171474

That isn't what I'm saying. I'm saying you can't make a blanket statement that LLMs in general aren't fit for some particular task. There are certainly tasks where no LLM is competent, but for others, some LLMs might be suitable while others are not. At least some level of detail beyond "they used an LLM" is required to know whether a) there was user error involved, or b) an inappropriate tool was chosen.

butlike · 2025-03-28T16:11:13 1743178273

then they shouldn't market it as one-size fits all

mwigdahl · 2025-03-28T16:30:27 1743179427

Are they? Every foundation model release includes benchmarks with different levels of performance in different task domains. I don't think I've seen any model advertised by its creating org as either perfect or even equally competent across all domains.

The secondary market snake oil salesmen <cough>Manus</cough>? That's another matter entirely and a very high degree of skepticism for their claims is certainly warranted. But that's not different than many other huckster-saturated domains.

TexanFeller · 2025-03-28T17:39:09 1743183549

People like Zuckerberg go around claiming most of their code will be written by AI starting sometime this year. Other companies are hearing that and using it as a reason(or false cover) for layoffs. The reality is LLMs still have a way to go before replacing experienced devs and even when they start getting there there will be a period of time where we’re learning what we can and can’t trust them with and how to use them effectively and responsibly. Feels like at least a few years from now, but the marketing says it’s now.

some_random · 2025-03-28T15:46:16 1743176776

In many, many cases those problems are resolved by improvements to the model. The point is that making a big deal about LLM fuck ups in 3 year old models that don't reproduce in new ones is a complete waste of time and just spreads FUD.

actinium226 · 2025-03-28T15:49:01 1743176941

Did you read the original tweet? She mentions the models and gives high level versions of her prompts. I'm not sure what "scaffolding" is.

You're right that they're tools, but I think the complaint here is that they're bad tools, much worse than they are hyped to be, to the point that they actually make you less efficient because you have to do more legwork to verify what they're saying. And I'm not sure that "prompt training," which is what I think you're suggesting, is an answer.

I had several bad experiences lately. With Claude 3.7 I asked how to restore a running database in AWS to a snapshot (RDS, if anyone cares). It basically said "Sure, just go to the db in the AWS console and select 'Restore from snapshot' in the actions menu." There was no such button. I later read AWS docs that said you cannot restore a running database to a snapshot, you have to create a new one.

I'm not sure that any amount of prompting will make me feel confident that it's finally not making stuff up.

mwigdahl · 2025-03-28T16:24:54 1743179094

I was responding to the "they used an LLM" story about the Norwegian school report, not the original tweet. The original tweet has a great level of detail.

I agree that hallucination is still a problem, albeit a lot less of one than it was in the recent past. If you're using LLMs for tasks where you are not directly providing it the context it needs, or where it doesn't have solid tooling to find and incorporate that context itself, that risk is increased.

tempfile · 2025-03-28T13:52:44 1743169964

Why do you think these details are important? The entire point of these tools is that I am supposed to be able to trust what they say. The hard work is precisely to be able to spot which things are true and false. If I could do that I wouldn't need an assistant.

barnabee · 2025-03-28T16:47:48 1743180468

> The entire point of these tools is that I am supposed to be able to trust what they say

Hard disagree, and I feel like this assumption might be at the root of why some people seem so down on LLMs.

They’re a tool. When they’re useful to me, they’re so useful they save me hours (sometimes days) and allow me to do things I couldn’t otherwise, and when they’re not they’re not.

It never takes me very long to figure out which scenario I’m in, but I 100% understand and accept that figuring that out is on me and part of the deal!

Sure if you think you can “vibe code” (or “vibe founder”) your way to massive success but getting LLMs to do stuff you’re clueless about without anyone way to check, you’re going to have a bad time, but the fact they can’t (so far) do that doesn’t make them worthless.

prophesi · 2025-03-28T14:02:56 1743170576

Because then I can know whether the hallucinations they encountered are a little surprising, or not surprising at all.

casey2 · 2025-03-28T13:57:03 1743170223

Because it's the difference between a fleshy hallucination and something that might related to reality.

jodrellblank · 2025-03-28T15:59:18 1743177558

> Why do you think these details are important?

It's https://en.wikipedia.org/wiki/Sealioning

eric_cc · 2025-03-28T15:59:50 1743177590

Sounds like a user problem, though. When used properly as a tool they are incredible. When you give up 100% trust to them to be perfect it’s you that is making the mistake.

jabroni_salad · 2025-03-28T15:45:53 1743176753

Well yeah, it's fancy autocomplete. And it's extremely amazing what 'fancy autocomplete' is able to do, but making the decision to use an LLM for the type of project you described is effectively just magical thinking. That isn't an indictment against LLM, but rather the person who chose the wrong tool for the job.

KoolKat23 · 2025-03-28T13:32:38 1743168758

This is more a lack of understanding of it's limitations, it'd be different if they asked for it to write a python script to collate the data.

xigoi · 2025-03-28T18:20:44 1743186044

If the LLM is intelligent, why can’t it figure out that writing a script would be the best way to solve the problem?

DarmokJalad1701 · 2025-03-28T20:50:43 1743195043

Some of the more modern tools do exactly that. If you upload a CSV to Claude, it will not (or at least not anymore) try to process the whole thing. It will read the header, and then ask you what you want. It will then write the appropriate Javascript code and run it to process the data and figure out the stats/whatever you asked it for.

I recently did this with a (pretty large) exported CSV of calories/exercise data from MyFitnessPal and asked it to evaluate it against my goals/past bloodwork etc (which I have in a "Claude Project" so that it has access to all that information + info I had it condense and add to the project context from previous convos).

It wrote a script to extract out extremely relevant metrics (like ratio of macronutrients on a daily basis for example), then ran it and proceeded to talk about the result, correlating it with past context.

Use the tools properly and you will get the desired results.

simonw · 2025-03-29T11:22:27 1743247347

ChatGPT has been able to do exactly that (using its Code Interpreter tool) for two years now. Gemini and Claude have similar features.

KoolKat23 · 2025-03-28T20:59:38 1743195578

Often they will do exactly that, currently their reasoning isn't the best so you may have to coax it to take the best path. It's also making a judgement call in its writing the code so worth checking too. No different to a senior instructing an intern.

pfdietz · 2025-03-28T13:39:44 1743169184

Ah, it's like communism, then (to its diehards). It cannot fail, it can only be failed.

KoolKat23 · 2025-03-28T13:42:55 1743169375

Please explain how what I am saying is wrong?

Zamaamiro · 2025-03-28T17:43:31 1743183811

This is an odd non-sequitur.

freilanzer · 2025-03-28T15:23:22 1743175402

So they used the model as a database? It should be immediately obvious to anyone that this won't work.

w0m · 2025-03-28T13:29:03 1743168543

"an old poorly implemented model can't do item X well therefore the technology is garbage"

Likely the most accurate measure of progress would be watching detractors goalposts move over time.

jodrellblank · 2025-03-28T15:56:37 1743177397

"Even a journey of 1,000 miles begins with the first step. Unless you're an AI hyper then taking the first step is the entire journey - how dare you move the goalposts"

Terretta · 2025-03-28T12:07:36 1743163656

"They continue to fabricate links, references, and quotes, like they did from day one." - "I ask them to give me a source for an alleged quote, I click on the link, it returns a 404 error."

Why have these companies not manually engineered out a problem like this by now? Just do a check to make sure links are real. That's pretty unimpressive to me.

There are no fabricated links, references, or quotes, in OpenAI's GPT 4.5 + Deep Research.

It's unfortunate the cost of a Deep Research bespoke white paper is so high. That mode is phenomenal for pre-work domain research. You get an analyst's two week writeup in under 20 minutes, for the low cost of $200/month (though I've seen estimates that white paper cost OpenAI over USD 3000 to produce for you, which explains the monthly limits).

You still need to be a domain expert to make use of this, just as you need to be to make use of an analyst. Both the analyst and Deep Research can generate flawed writeups with similar misunderstandings: mis-synthesizing, misapplication, or missing inclusion of some essential.

Neither analyst nor LLM is a substitute for mastery.

fridder · 2025-03-28T14:17:20 1743171440

While I agree, it doesn't stop business folks pushing for its use in area where it is inappropriate. That is, at least for me, part of the skepticism.

blactuary · 2025-03-28T14:48:14 1743173294

How do people in the future become domain experts capable of properly making use of it if they are not the analyst spending two weeks on the write-up today?

never_inline · 2025-03-28T17:54:42 1743184482

My complaints with Deep Research LLMs is they don't go deeper than 2 pages of SERPs. I want them to dig down obscure stuff, not list cursorily relevant peripheral directions. they just seem to do breadth first than depth first search.

waffletower · 2025-03-28T16:26:45 1743179205

This assessment is incomplete. Large languages models are both less and more than these traditional tools. They have not subsumed them and all can sit together in separate tabs of a single browser window. They are another resource, and when the conditions are right, which is often the case in my experience, they are a startlingly effective tool for navigating the information landscape. The criticism of Gemini is a fair one, and I encountered it yesterday, but perhaps with 50% less entitlement. But Gemini also helped me translate obscure termios APIs to python from C source code I provided. The equivalent using search and/or Stack Overflow would have required multiple piecemeal searches without guarantees -- and definitely would have taken much more time.

casey2 · 2025-03-28T13:53:41 1743170021

The 404 links are hilarious, like you can't even parse the output and retry until it returns a link that doesn't 404? Even ignoring the billions in valuation, this is so bad for a $20 sub.

eric_cc · 2025-03-28T15:58:45 1743177525

The tweeters complaints sound like a user problem. LLM’s are tools. How you use them, when you use them, and what you expect out of them should be based on the fact they are tools.

whamlastxmas · 2025-03-28T13:31:26 1743168686

I’m sorry but the experience of coding with an LLM is about ten billion times better than googling and stack overflowing every single problem I come across. I’ve stack overflowed maybe like two things in the past half year and I’m so glad to not have to routinely use what is now a very broken search engine and web ecosystem.

player1234 · 2025-03-28T14:44:20 1743173060

How did you measure and compare googling/stack overflow to coding with an LLM? How did you get to the very impressive number ten billion times better?! Can you share your methodology? How have you defined better?

whamlastxmas · 2025-03-29T14:45:39 1743259539

I take calipers to my boss’s forehead veins and see how pissed he is routinely throughout the day

quonn · 2025-03-28T14:26:36 1743171996

It‘s broken now. It was fine 5 years ago.

blactuary · 2025-03-28T14:50:05 1743173405

The search ecosystem is broken now because google is focused on LLMs

internet101010 · 2025-03-28T15:21:05 1743175265

That's part of it. The other part is Google sacrificing product quality for excessive monetization. An example would be YouTube search - first three results are relevant, next 12 results are irrelevant "people also watched", then back to relevant results. Another example would be searching for an item to buy and getting relevant results in the images tab of google, but not the shopping tab.

whamlastxmas · 2025-03-29T14:44:38 1743259478

It’s broken bc google has spent 20+ years promoting garbage content in a self-serving way. No one was able to compete unless they played by googles rules, and so all we have left is blog spam and regular spam

vonneumannstan · 2025-03-28T14:37:19 1743172639

[flagged]

zehaeva · 2025-03-28T14:54:43 1743173683

I thought summarizing papers/stories/emails/meetings was one of the touted use cases of LLMs?

What are the use cases where the expected performance is high?

vonneumannstan · 2025-03-28T15:19:59 1743175199

I didn't notice that example. I doubt top tier models have issues with that. I was more referencing Sabines mentions of hallucinating citations and papers which is an issue I also had 2 years ago but is probably solved by Deep Research at this point. She just has massive skill issues and doesn't know what shes doing.

>What are the use cases where the expected performance is high?

https://openai.com/index/introducing-chatgpt-pro/

o1-pro is probably at top tier human level performance on most small coding tasks and definitely at answering STEM questions. o3 is even better but not released outside of it powering Deep Research.

https://codeforces.com/blog/entry/137543 o3 is top 200 on Codeforces for example.

giantrobot · 2025-03-28T14:53:06 1743173586

> This is just not a use case where the expected performance on these tasks is high.

Yet the hucksters hyping AI are falling all over themselves saying AI can do all this stuff. This is where the centi-billion dollar valuations are coming from. It's been years and these super hyped AIs still suck at basic tasks.

When pre-AI shit Google gave wrong answers it at least linked to the source of the wrong answers. LLMs just output something that looks like a link and calls it a day.

vonneumannstan · 2025-03-28T14:55:57 1743173757

To be fair the newest tools like Deep Research are actually quite good and hallucination is essentially not a real problem for them.

https://marginalrevolution.com/marginalrevolution/2025/02/de...

frm88 · 2025-03-28T17:14:41 1743182081

<<After glowing reviews, I spent $200 to try it out for my research. It hallucinated 8 of 10 references on a couple of different engineeribg topics. For topics that are well established (literature search), it is useful, although o3-mini-high with web search worked even better for me. For truly frontier stuff, it is still a waste of time.>>

<<I've had the hallucination problem too, which renders it less than useful on any complex research project as far as I'm concerned.>>

These quotes are from the link you posted. There are a lot more.

vonneumannstan · 2025-03-28T17:27:29 1743182849

I think Sabine is just wrong in this case. I don't think Deep Research can even hallucinate links in this way at all.

agentcoops · 2025-03-28T11:57:47 1743163067

The whole point is that an LLM is not a search engine and obviously anyone who treats it as one is going to be unsatisfied. It's just not a sensible comparison. You should compare working with an LLM to working with an old "state of the art" language tool like Python NLTK -- or, indeed, specifying a problem in Python versus specifying it in the form of a prompt -- to understand the unbridgeable gap between what we have today and what seemed to be the best even a few years ago. I understand when a popular science author or my relatives haven't understood this several years after mass access to LLMs, but I admit to being surprised when software developers have not.

Hosted and free or subscription-based DeepResearch like tools that integrate LLMs with search functionality (the whole domain of "RAG" or "Retrieval Augmented Generation") will be elementary for a long time yet simply because the cost of the average query starts to go up exponentially and there isn't that much money in it yet. Many people have and will continue to build their own research tools where they can determine how much compute time and API access cost they're willing to spend on a given query. OCR remains a hard problem, let alone appropriately chunking potentially hundreds of long documents into context length and synthesizing the outputs of potentially thousands of LLM outputs into a single response.

throwawaymaths · 2025-03-28T14:46:33 1743173193

to be fair a few? one? years ago LLMs were touted? marketed? as a "search killer", and a lot of people do use it in that fashion.

joquarky · 2025-03-28T16:26:02 1743179162

A lot of people need to improve their critical thinking skills to deconstruct the marketing hype, and then choose the right tool for the job.

throwawaymaths · 2025-03-28T16:55:11 1743180911

sure. isn't that effectively what Sabine is doing though? She just doesn't have as compelling a use in the cases where LLMs are strong.

agentcoops · 2025-03-28T19:20:24 1743189624

Certainly. I agree of course as to the problem of hype and I'm aware of how many people use LLMs today. I tried to emphasize in my earlier post that I can understand why someone like Sabine has the opinion she does -- I'm more confused how there's still similar positions to be found among software developers, evidenced often within Hacker News threads like the one we're in. I don't intend that to refer to you, who clearly has more than a passing knowledge of LLM internals, but more to the original commenter I was responding to.

More than marketing, I think from my experience it's chat with little control over context as the primary interface of most non-engineers with LLMs that leads to (mis)expectations of the tool in front of them. Having so little control over what is actually being input to the model makes it difficult to learn to treat a prompt as something more like a program.

somenameforme · 2025-03-28T08:28:55 1743150535

It's mostly because of how they were initially marketed. In an effort to drive hype 'we' were promised the world. Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence? In reality Bard, let alone whatever early version he was using, is about as sentient as my left asscheek.

OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff. But on that same note (and also as you alluded to) - I do agree that LLMs have a lot of use as natural language search engines in spite of their proclivity to hallucinate. Being able to describe a e.g. function call (or some esoteric piece of history) by description and then often get the precise term/event that I'm looking for is just incredibly useful.

But LLMs obviously are not sentient, are not setting us on the path to AGI, or any other such nonsense. They're arguably what search engines should have been 10 or 15 years ago, but anti-competitive monopolization of the industry meant that search engine technology progress basically stalled out, if not regressed for the sake of ads (and individual 'entrepreneurs' becoming better at SEO), about the time Google fully established itself.

chimprich · 2025-03-28T11:25:06 1743161106

> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?

I presume you are referring to this Google engineer, who was sacked for making the claim. Hardly an example of AI companies overhyping the tech; precisely the opposite, in fact. https://www.bbc.co.uk/news/technology-62275326

It seems to be a common human hallucination to imagine that large organisations are conspiring against us.

MindBeams · 2025-03-28T13:26:04 1743168364

Corporations are motivated by profit, not doing what's best for humanity. If you need an example of "large organizations conspiring against us," I can give you twenty.

chimprich · 2025-03-28T14:47:15 1743173235

I agree that sometimes organisations conspire against people. My point was, in case it wasn't apparent, the irony that somenameforme was talking about how LLMs were of little use because they hallucinate, whilst apparently hallucinating a conspiracy by AI companies to overhype the technology.

I wasn't making a political point. You see similar evidence-free allegations against international organisations and national government bodies.

dsr_ · 2025-03-28T12:52:03 1743166323

There is no difference between an organization and a conspiracy. Organizing to do something is the same as conspiring to do something.

That leaves the question of whether the organization is commensal, symbiotic or predatory towards any given "us".

jibal · 2025-03-28T10:38:32 1743158312

> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?

That's not what happened. Google stomped hard on Lemoine, saying clearly that he was wrong about LaMDA being sentient ... and then they fired him for leaking the transcripts.

Your whole argument here is based on false information and faulty logic.

brookst · 2025-03-28T12:19:48 1743164388

Which is pretty ironic in a thread littered with people asserting LLMs are useless because they can hallucinate and create illogical outputs.

mdp2021 · 2025-03-28T12:32:40 1743165160

So in your reading people who consider the weakness of LLMs a definite problem should call humans generally "adequate".

Were people capable to lift concrete pillars, cranes would not be sought.

Edit: #@!! snipers. Speak up.

brookst · 2025-03-28T17:32:17 1743183137

Can’t tell what you’re trying to say or who you’re angry at. Rephrase please?

mdp2021 · 2025-03-28T22:41:23 1743201683

Were you perchance noting that according to some people «LLMs ... can hallucinate and create illogical outputs» (you also specified «useless», but that must be a further subset and will hardly create a «litter[ing]» here), but also that some people use «false information and faulty logic»?

Noting that people are imperfect is not a justification for the weaknesses in LLMs. Since around late 2022 some people started stating LLMs are "smart like their cousin", to which the answer remains "we hope that your cousin has a proportionate employment".

If you built a crane that only lifts 15kg, it's no justification that "many people lift 10". The purpose of the crane is to lift as needed, with abundance for safety.

If we build cranes, it is because people are not sufficient: the relative weakness of people is, far from a consolation of weak cranes, the very reason why we want strong cranes. Similarly for intelligence and other qualities.

People are known to use use «false information and faulty logic»: but they are not being called "adequate".

> angry at

There's a subculture around here that thinks it normal to downvote without any rebuttal - equivalent to "sneering and leaving" (quite impolite), almost all times it leaves us without a clue about what could be the point of disapproval.

hatefulmoron · 2025-03-28T11:21:53 1743160913

I think you're missing the point. He's pointing out what the atmosphere was/is around LLMs in these discussions, and how that impacts stories like with Lemoine.

I mean, you're right that he's silly and Google didn't want to be part of it, but it was (and is?) taken seriously that: LLMs are nascent AGI, companies are pouring money to get there first, we might be a year or two away. Take these as true, it's at least possible that Google might have something chained up in their basement.

In retrospect, Google dismissed him because he was acting in a strange and destructive way. At the time, it could be spun as just further evidence: they're silencing him because he's right. Could it have created such hysteria and silliness if the environment hadn't been so poisoned by the talk of imminent AGI/sentience?

brookst · 2025-03-28T12:20:40 1743164440

Sure, but does any of that support the claim that LLMs were marketed as superintelligence?

hatefulmoron · 2025-03-28T12:57:19 1743166639

Which comment claimed that LLMs were marketed as super-intelligence? I'm looking up the chain and I can't see it.

I don't think they were, but I think it's pretty clear they were marketed as being the imminent path to super-intelligence, or something like it. OpenAI were saying GPT-(n-1) is as intelligent as a high school student, GPT-(n) is a university student, GPT-(n+1) will be.. something.

jibal · 2025-04-09T07:23:36 1744183416

That's the whole discussion here: "It's mostly because of how they were initially marketed. In an effort to drive hype 'we' were promised the world. Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?"

jibal · 2025-04-05T10:53:48 1743850428

I did not miss any point and that's an ad hominem charge. He misrepresented the facts and based an argument on that misrepresentation and I pointed that out.

"In retrospect, Google dismissed him because he was acting in a strange and destructive way."

No, they dismissed him because he had released Google internal product information, "In retrospect" or otherwise.

thaumasiotes · 2025-03-28T08:53:35 1743152015

> OpenAI did similar things by focusing to the point of absurdity on 'safety' for what was basically a natural language search engine that has a habit of inventing nonsensical stuff.

The focus on safety, and the concept of "AI", preexisted the product. An LLM was just the thing they eventually made; it wasn't the thing they were hoping to make. They applied their existing beliefs to it anyway.

fifticon · 2025-03-28T10:06:44 1743156404

I am worried about them as a substitute for search engines. My reasoning is that classic google web-scraping and SEO, as shitty as it may be, is 'open-source' (or at least, 'open-citation') in nature - you can 'inspect the sh*t it's built from'. Whereas LLMs, to me seem like a chinese - or western - totalitarian political system wet dream - 'we can set up an inscrutable source of "truth" for the people to use, with the _truths_ we intend them to receive'. We already saw how weird and unsane this was, when they were configured to be woke under the previous regime. Imagine it being configured for 'the other post-truth' is a nightmare.

f6v · 2025-03-28T10:36:01 1743158161

> Remember the "leaks" from Google about an engineer trying to get the word out that they had created a sentient intelligence?

No, first time I hear about it. I guess the secret to happiness is not following leaks. I had very low expectations before trying LLMs and I’m extremely impressed now.

decimalenough · 2025-03-28T11:06:13 1743159973

This was three years ago:

https://www.theguardian.com/technology/2022/jun/12/google-en...

He was fired and a casual browse of his blog makes it quite clear that he was a few fries short of a Happy Meal all along.

mdp2021 · 2025-03-28T12:24:42 1743164682

> """happiness"""

Not following leaks, or just the news, not living in the real world, not caring of the consequences of reality: anybody can think he's """happy""" with psychedelia and with just living in private world. But it is the same kind of "happy" that comes with "just smile".

If you did not get information that there are severe pitfalls - which is by the way so unrelated to the "it's sentient thing", as we are talking about the faults in the products, not the faults in human fools -, you are supposed to see them from your own judgement.

emsign · 2025-03-28T08:38:48 1743151128

They have their value in analyzing huge amounts of data for example scientific papers or raw observations, but the popular public ones are mostly trained on stolen/pirated texts offthe internet and from social media clouds the companies control. So this means: bullshit in -> bullshit out. I don't need machines for that the regular human bullshitters do this job just fine.

TeMPOraL · 2025-03-28T11:09:24 1743160164

> the popular public ones are mostly trained on stolen/pirated texts offthe internet

You mean like actual literature, textbooks and scientific papers? You can't get them in bulk without pirating. Thank intellectual property laws.

> from social media clouds the companies control

I.e. conversations of real people about matters of real life.

But if it satisfies your elitist, ivory-towerish vision of "healthy information diet" for LLMs, then consider that e.g. Twitter is where, until now, you'd get most updates from the best minds in several scientific fields. Or that besides r/All, the Reddit dataset also contains r/AskHistorians and other subreddits where actual experts answer questions and give first-hand accounts of things.

The actually important bit though, is that LLM training manages to extract value from both the "bullshit" and whatever you'd call "not bullshit", as the model has to learn to work with natural language just as much as it has to learn hard facts or scientific theories.

agentcoops · 2025-03-28T11:46:35 1743162395

Yes, I find the biggest issue in discussing the present state of AI with people outside the field, whether technical or not, is that "machine learning" had only just entered popular understanding: i.e. everyone seems ready today to talk about the limits of training a machine learning model on X limited data set, unable to extrapolate beyond it. The difference between "learning the best binary classifier on a labelled training set" and "exploring the set of all possible programs representable by a deep neural network of whatever architecture to find that which best generates all digitally recorded traces of human beings throughout history" is very far from intuitive to even specialists. I think Ilya's old public discussions of this question are the most insightful for a popular audience, explaining how and why a world model and not simply a Markov chain is necessary to solve the seemingly trivial problem of "predicting the next word in a sequence."

brookst · 2025-03-28T12:22:08 1743164528

A lot of irony in that comment.

TeMPOraL · 2025-03-28T11:01:41 1743159701

Nobody promised the world. The marketing underpromised and LLMs overdelivered. Safety worries didn't come from marketing, it came from people who were studying this as a mostly theoretical worry for the next 50+ years, only to see major milestones crossed a decade or more before they expected.

Did many people overhype LLMs? Yes, like with everything else (transhumanist ideas, quantum physics). It helps being more picky who one listens to, and whether they're just painting pretty pictures with words, or actually have something resembling a rational argument in there.

babyent · 2025-03-28T15:44:57 1743176697

Bro AGI as a marketing term is too stale already.

We are now at Artificial SUPER Intelligence.

I’m waiting for Artificial Pro Max Super Duper Intelligence.

latentsea · 2025-03-28T22:38:05 1743201485

You wait, someday they'll come out with something they start calling AI 2.0

snitty · 2025-03-28T02:25:46 1743128746

Folks really over index when an LLM is very good for their use case. And most of the folks here are coders, at which they're already good and getting better.

For some tasks they're still next to useless, and people who do those tasks understandably don't get the hype.

Tell a lab biologist or chemist to use an LLM to help them with their work and they'll get very little useful out of it.

Ask an attorney to use it and it's going to miss things that are blindingly obvious to the attorney.

Ask a professional researcher to use it and it won't come up with good sources.

For me, I've had a lot of those really frustrating experiences where I'm having difficulty on a topic and it gives me utter incorrect junk because there just isn't a lot already published about that data.

I've fed it tricky programming tasks and gotten back code that doesn't work, and that I can't debug because I have no idea what it's trying to do, or I'm not familiar with the libraries it used.

eyegor · 2025-03-28T02:43:20 1743129800

It sounds like you're trying to use these llms as oracles, which is going to cause you a lot of frustration. I've found almost all of them now excel at imitating a junior dev or a drunk PhD student. For example the other day I was looking at acoustic sensor data and I ran it down the trail of "what are some ways to look for repeating patterns like xyz" and 10 minutes later I had a mostly working proof of concept for a 2nd order spectrogram that reasonably dealt with spectral leakage and a half working mel spectrum fingerprint idea. Those are all things I was thinking about myself, so I was able to guide it to a mostly working prototype in very little time. But doing it myself from zero would've taken at least a couple of hours.

But truthfully 90% of work related programming is not problem solving, it's implementing business logic. And dealing with poor, ever changing customer specs. Which an llm will not help with.

TeMPOraL · 2025-03-28T11:31:18 1743161478

> But truthfully 90% of work related programming is not problem solving, it's implementing business logic. And dealing with poor, ever changing customer specs. Which an llm will not help with.

Au contraire, these are exactly things LLMs are super helpful at - most of business logic in any company is just doing the same thing every other company is doing; there's not that many unique challenges in day-to-day programming (or business in general). And then, more than half of the work of "implementing business logic" is feeding data in and out, presenting it to the user, and a bunch of other things that boil down to gluing together preexisting components and frameworks - again, a kind of work that LLMs are quite a big time-saver for, if you use them right.

bachmeier · 2025-03-28T02:35:52 1743129352

Strongly in agreement. I've tried them and mostly come away unimpressed. If you work in a field where you have to get things right, and it's more work to double check and then fix everything done by the LLM, they're worse than useless. Sure, I've seen a few cases where they have value, but they're not much of my job. Cool is not the same as valuable.

If you think "it can't quite do what I need, I'll wait a little longer until it can" you may still be waiting 50 years from now.

giantrobot · 2025-03-28T17:48:24 1743184104

> If you work in a field where you have to get things right, and it's more work to double check and then fix everything done by the LLM, they're worse than useless.

Most programmers understand reading code is often harder than writing it. Especially when someone else wrote the code. I'm a bit amused by the cognitive dissonance of programmers understanding that and then praising code handed to them by an LLM.

It's not that LLMs are useless for programming (or other technical tasks) but they're very junior practitioners. Even when they get "smarter" with reasoning or more parameters their nature of confabulation means they can't be fully trusted in the way their proponents suggest we trust them.

It's not that people don't make mistakes but they often make reasonable mistakes. LLMs make unreasonable mistakes at random. There's no way to predict the distribution of their mistakes. I can learn a human junior developer sucks at memory management or something. I can ask them to improve areas they're weak in and check those areas of their work in more detail.

I have to spend a lot of time reviewing all output from LLMs because there's rarely rhyme or reason to their errors. They save me a bunch of typing but replace a lot of my savings with reviews and debugging.

throw234234234 · 2025-03-28T05:16:45 1743139005

My view is that it will be some time before they can as well because of the success in the software domain - not because LLM's aren't capable as a tech but because data owners and practitioners in other domains will resist the change. From the SWE experience, news reports, financial magazines, etc many are preparing accordingly, even if it is a subconscious thing. People don't like change, and don't want to be threatened when it is them at risk - no one wants what happened to artists and now SWE's to happen to their profession. They are happy for other professions to "democratize/commoditize" as long as it isn't them - after all this increases their purchasing power. Don't open source knowledge/products, don't let AI near your vertical domain, continue to command a premium for as long as you can - I've heard variations of this in many AI conversations. Much easier in oligopoly and monopoly like domains and/or domains where knowledge was known to be a moat even when mixed with software as you have more trust competitors won't do the same.

For many industries/people work is a means to earn, not something to be passionate in for its own sake. Its a means to provide for other things in life you are actually passionate about (e.g. family, lifestyle, etc). In the end AI may get your job eventually but if it gets you much later vs other industries/domains you win from a capital perspective as other goods get cheaper and you still command your pre-AI scarcity premium. This makes it easier for them to acquire more assets from the early disrupted industries and shield them from eventual AI taking over.

I'm seeing this directly in software. Less new frameworks/libraries/etc outside the AI domain being published IMO, more apprehension from companies to open source their work and/or expose what they do, etc. Attracting talent is also no longer as strong of a reason to showcase what you do to prospective employees - economic conditions and/or AI make that less necessary as well.

aetherson · 2025-03-28T02:30:54 1743129054

I know at least two attorneys who use LLMs productively.

As with all LLM usage right now, it's a tool and not fit for every purpose. But it has legit uses for some attorney tasks.

NoGravitas · 2025-03-28T02:48:16 1743130096

I frequently see news stories where attorneys get in trouble for using LLMs, because they cite hallucinated case law (e.g.). If they didn't get caught, that would look the same as using them "productively".

squiggleblaz · 2025-03-28T08:41:34 1743151294

Asking the LLM for relevant case law and checking it up - productive use of LLM. Asking the LLM to write your argument for you and not checking it up - unproductive use of LLM. It's the same as with programming.

snitty · 2025-03-28T10:03:13 1743156193

>Asking the LLM for relevant case law and checking it up - productive use of LLM

That's a terrible use for an LLM. There are several deterministic search engines attorneys use to find relevant case law, where you don't have to check to see if the cases actually exist after it produces results. Plus, the actual text of the case is usually very important, and isn't available if you're using an LLM.

Which isn't to say they're not useful for attorneys. I've had success getting them to do some secretarial and administrative things. But for the core of what attorneys do, they're not great.

captnObvious · 2025-03-28T10:57:57 1743159477

For law firms creating their own repositories of case law, having LLMs search via summaries, and then dive into the selected cases to extract pertinent information seems like an obvious great use case to build a solution using LLMs.

The orchestration of LLms that will be reading transcripts, reading emails, reading case law, and preparing briefs with sources is unavoidable in the next 3 years. I don’t doubt multiple industry specialized solutions are already under development.

Just asking chatGPT to make your case for you is missing the opportunity.

If anyone is unable to get Claud 3.7 or Gemini 2.5 to accelerate their development work I have to doubt their sentience at this point. (Or more likely doubt that they’re actively testing these things regularly)

dogmayor · 2025-03-28T11:46:30 1743162390

Law firms don't create their own repos of case law. They use a database like westlaw or lexis. LLMs "preparing briefs with sources" would be a disaster and wholly misunderstands what legal writing entails.

tiahura · 2025-03-28T12:02:03 1743163323

This lawyer uses llms for everything. Correspondence, document review, drafting demands, drafting pleadings,discovery requests, discovery responses, golden rule letters, motions, responses to motions, deposition outlines, depo prep, voir dire, opening, direct, cross, closings.

joquarky · 2025-03-28T17:00:53 1743181253

I find it very useful to review the output and consider its suggestions.

I don’t trust it blindly, and I often don’t use most of what it suggests; but I do apply critical thinking to evaluate what might be useful.

The simplest example is using it as a reverse dictionary. If I know there’s a word for a concept, I’ll ask an LLM. When I read the response, I either recognize the word or verify it using a regular dictionary.

I think a lot of the contention in these discussions is because people are using it for different purposes: it's unreliable for some purposes and it is excellent at others.

rsynnott · 2025-03-28T18:53:22 1743188002

> For law firms creating their own repositories of case law

Wait, why would a law firm create its own repository of case law? It's not like it has access to secret case law that other lawfirms do not.

rsynnott · 2025-03-28T13:05:05 1743167105

> Asking the LLM for relevant case law and checking it up - productive use of LLM.

Only if you're okay with it missing stuff. If I hired a lawyer, and they used a magic robot rather than doing proper research, and thus missed relevant information, and this later came to light, I'd be going after them for malpractice, tbh.

zuminator · 2025-03-28T16:36:16 1743179776

Ironically both Westlaw and Lexis have integrated AI into their offerings so it's magic robots all the way down.

https://legal.thomsonreuters.com/en/products/westlaw-edge https://www.lexisnexis.com/en-us/products/protege.page

hansmayer · 2025-03-28T11:55:50 1743162950

Surely this was meant ironically, right? You must've heard of at least one of the many cases involving lawyers doing precisely what you described and ending up presenting made up legal cases in court. Guess how that worked out for them.

aetherson · 2025-03-28T11:25:45 1743161145

The uses that they cited to me were "additional pair of eyes in reviewing contracts," and, "deep research to get started on providing a detailed overview of a legal topic."

croes · 2025-03-28T05:05:51 1743138351

The problem it‘s marketed as a general tool that at least in the future will work near perfectly if we just provide enough data and computing power.

anon291 · 2025-03-28T02:47:27 1743130047

> For some tasks they're still next to useless, and people who do those tasks understandably don't get the hype.

This is because programmers talk on the forums that programmers scrape to get data to train the models.

topaz0 · 2025-03-28T10:48:42 1743158922

Honestly it's worse than this. A good lab biologist/chemist will try to use it, understand that it's useless, and stop using it. A bad lab biologist/chemist will try to use it, think that it's useful, and then it will make them useless by giving them wrong information. So it's not just that people over-index when it is useful, they also over-index when it's actively harmful but they think it's useful.

brookst · 2025-03-28T12:35:35 1743165335

You think good biologists never need to summarize work into digestible language, or fill out multiple huge, redundant grant applications with the same info, or reformat data, or check that a writeup accurate reflects data?

I’m not a biologist (good or bad) but the scientists I know (who I think are good) often complain that most of the work is drudgery unrelated to the science they love.

topaz0 · 2025-03-28T14:42:59 1743172979

Sure, lots of drudgery, but none of your examples are things that you could trust an LLM to do correctly when correctness counts. And correctness always counts in science.

Edit to add: and regardless, I'm less interested in the "LLM's aren't ever useful to science" part of the point. The point that actual LLM usage in science will mostly be for cases where they seem useful but actually introduce subtle problems is much more important. I have observed this happening with trainees.

brookst · 2025-03-28T17:29:27 1743182967

I have also seen trainees introduce subtle problems when they think they know more than they do.

tiahura · 2025-03-28T11:56:18 1743162978

This attorney uses it all day every day.

gloosx · 2025-03-28T08:10:39 1743149439

The problem Sabine tries to communicate is that reality is different from what the cash-heads behind main commercial models are trying to portray. They push the narrative that they’ve created something akin to human cognition, when in reality, they’ve just optimised prediction algorithms on an unprecedented scale. They are trying to say that they created Intelligence, which is the ability to acquire and apply knowledge and skills, but we all know the only real Intelligence they are creating is the collection of information of military or political value.

The technology is indeed amazing and very amusing, but like all the good things in the hands of corporate overlords, it will be slowly turning into profit-milking abomination.

lnenad · 2025-03-28T08:39:38 1743151178

> They push the narrative that they’ve created something akin to human cognition

This is your interpretation of what these companies are saying. I'd love to see if some company specifically anything like that?

Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now? How many things from today when brought back into 2010 would make the person using it make it feel like they're being tricked or pranked? We already take them for granted even thought they've only been around for less than half of a decade.

LLMs aren't a catch all solution to the world's problems; or something that is going to help us in every facet of our lives; or an accelerator for every industry that exists out there. But at no point in history could you talk to your phone about general topics, get information, practice language skills, build an assistant that teaches your kid about the basics of science, use something to accelerate your work in a many different ways etc...

Looking at llms shouldn't be boolean, it shouldn't be between they're the best thing ever invented vs they're useless; but it seems like everyone presents the issue in this manner and Sabine is part of that problem.

gloosx · 2025-03-28T08:46:42 1743151602

No major company directly states "We have created human-like intelligence," they intentionally use suggestive language that leads people to think AI is approaching human cognition. This helps with hype, investment, and PR.

>I'd love to see if some company specifically anything like that?

1. DeepMind researchers: Sparks of Artificial General Intelligence: Early experiments with GPT-4 - https://arxiv.org/abs/2303.12712

2. "GPT-4 is not AGI, but it does exhibit more general intelligence than previous models." - Sam Altman

3. Musk has claimed that AI is on the path to "understanding the universe." His branding of Tesla's self-driving AI as "Full Self-Driving" (FSD) also misleadingly suggests a level of autonomous reasoning that doesn't exist.

4. Meta's AI chief scientist, Yann LeCun, has repeatedly said they are working on giving AI "common sense" and "world models" similar to how humans think.

>Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now?

ELIZA is an early natural language processing computer program developed from 1964 to 1967

ELIZA's creator, Weizenbaum, intended the program as a method to explore communication between humans and machines. He was surprised and shocked that some people, including Weizenbaum's secretary, attributed human-like feelings to the computer program. 60 years ago.

So as you can see, us humans are not too hard to fool with this.

jibal · 2025-03-28T10:50:28 1743159028

ELIZA was not a natural language processor, and the fact that some people were easily fooled by a program that produced canned responses based on keywords in the text but was presented as a psychotherapist is not relevant to the issue here--it's a fallacy of affirmation of the consequent.

Also,

"4. Meta's AI chief scientist, Yann LeCun, has repeatedly said they are working on giving AI "common sense" and "world models" similar to how humans think."

completely misses the mark. That LLMs don't do this is a criticism from old-school AI researchers like Gary Marcus; LeCun is saying that they are addressing the criticism by developing the sorts of technology that Marcus says are necessary.

lnenad · 2025-03-28T09:09:35 1743152975

> they intentionally use suggestive language that leads people to think AI is approaching human cognition. This helps with hype, investment, and PR.

As do all companies in the world. If you want to buy a hammer, the company will sell it as the best hammer in the world. It's the norm.

I don't know exactly what your point is with ELIZA?

> So as you can see, us humans are not too hard to fool with this.

I mean ok? How is that related to having a 30 minute conversation with ChatGPT where it teaches you a language? Or Claude outputting an entire application in a single go? Or having them guide you through fixing your fridge by uploading the instructions? Or using NotebookLM to help you digest a scientific paper?

gloosx · 2025-03-28T09:38:22 1743154702

Im not saying LLMs are not impressive or useful — Im pointing out that corporations behind commercial AI models are capitalising on our emotional response to natural language prediction. This phenomenon isnt new – Weizenbaum observed it 60 years ago, even with the simplest of algorithms like ELIZA.

Your example actually highlights this well. AI excels at language, so it’s naturally strong in teaching (especially for language learning ;)). But coding is different. It’s not just about syntax; it requires problem-solving, debugging, and system design — areas where AI struggles because it lacks true reasoning.

There’s no denying that when AI helps you achieve or learn something new, it’s a fascinating moment — proof that we’re living in 2025, not 1967. But the more commercialised it gets, the more mythical and misleading the narrative becomes

TeMPOraL · 2025-03-28T11:16:27 1743160587

> system design — areas where AI struggles because it lacks true reasoning.

Others addressed code, but with system design specifically - this is more of an engineering field now, in that there's established patterns, a set of components at various levels of abstraction, and a fuck ton of material about how to do it, including but not limited to everything FAANG publishes as preparatory material for their System Design interviews. At this point in time, we have both a good theoretical framework and a large collection of "design patterns" solving common problems. The need for advanced reasoning is limited, and almost no one is facing unique problems here.

I've tested it recently, and suffice it to say, Claude 3.7 Sonnet can design systems just fine - in fact much better than I'd expect a random senior engineer to. Having the breadth of knowledge and being really good at fitting patterns is a big advantage it has over people.

lnenad · 2025-03-28T10:11:04 1743156664

You originally said

> They push the narrative that they’ve created something akin to human cognition

I am saying they're not doing that, they're doing sales and marketing and it's you that interprets this as possible/true. In my analogy if the company said it's a hammer that can do anything, you wouldn't use it to debug elixir. You understand what hammers are for and you realize the scope is different. Same here. It's a tool that has its uses and limits.

> Your example actually highlights this well. AI excels at language, so it’s naturally strong in teaching (especially for language learning ;)). But coding is different. It’s not just about syntax; it requires problem-solving, debugging, and system design — areas where AI struggles because it lacks true reasoning.

I disagree since I use it daily and Claude is really good at coding. It's saving me a lot of time. It's not gonna build a new Waymo but I don't expect it to. But this is besides the point. In the original tweet what Sabine is implying is that it's useless and OpenAI should be worth less than a shoe factory. When in fact this is a very poor approach to look at LLMs and their value and both sides of the spectrum are problematic (those that say it's a catch all AGI and those that say hurr it couldn't solve P versus NP it's trash).

Archer6621 · 2025-03-28T14:21:49 1743171709

I think one difference between a hammer and an LLM is that hammers have existed since forever, so common sense is assumed to be there as to what their purpose is. For LLMs though, people are still discovering on a daily basis to what extent they can usefully apply them, so it's much easier to take such promises made by companies out of context if you are not knowledgeable/educated on LLMs and their limitations.

MindBeams · 2025-03-28T13:40:41 1743169241

>they're doing sales and marketing and it's you that interprets this as possible/true.

You've moved the goalpost from "they're not saying it" to "they're saying, but you're not supposed to believe it."

lnenad · 2025-03-28T14:07:07 1743170827

The companies are not doing it. This is what I am saying.

MindBeams · 2025-03-28T23:42:03 1743205323

You admitted earlier that they are:

Person you replied to: they intentionally use suggestive language that leads people to think AI is approaching human cognition. This helps with hype, investment, and PR.

Your response: As do all companies in the world. If you want to buy a hammer, the company will sell it as the best hammer in the world. It's the norm.

jibal · 2025-03-28T11:05:51 1743159951

As a programmer (and GOFAI buff) for 60 years who was initially highly critical of the notion of LLMs being able to write code because they have no mental states, I have been amazed by the latest incarnations being able to write complex functioning code in many cases. There are, however, specific ways that not being reasoners is evident ... e.g., they tend to overengineer because they fail to understand that many situations aren't possible. I recently had an example where one node in a tree was being merged into another, resulting in the child list of the absorbed node being added to the child list of the kept node. Without explicit guidance, the LLM didn't "understand" (that is, its response did not reflect) that a child node can only have one parent so collisions weren't possible.

> proof that we’re living in 2025, not 1967. But the more commercialised it gets, the more mythical and misleading the narrative becomes

You seem to be living in 2024, or 2023. People generally have far more pragmatic expectations these days, and the companies are doing a lot less overselling ... in part because it's harder to come up with hype that exceeds the actual performance of these systems.

hansmayer · 2025-03-28T09:33:35 1743154415

How about Sam Altman literally saying on twitter "We know how to build AGI now"? That close enough?

brookst · 2025-03-28T12:28:21 1743164901

“We know how to build something” is pretty different from “our in-market products are something”

lnenad · 2025-03-28T10:12:23 1743156743

How many examples of CEOs writing shit like that can you name? I can name more than one. Elon's been saying that camera driven level 5 autonomous driving will be ready in 2021. Did you believe him?

WickyNilliams · 2025-03-28T11:18:47 1743160727

You went from "they're not saying it" to "and you believe them when they say it??" Pretty quickly

lnenad · 2025-03-28T12:29:16 1743164956

I said the company is not saying this and not using it for marketing - and this stays true. CEOs hyping their stock is par for the course.

johnecheck · 2025-03-28T14:42:42 1743172962

A CEO is the most visible representative of a company.

A statement on their personal Twitter might not be "the company's" statement, but who cares?

Sam Altman's social media IS OpenAI marketing.

blactuary · 2025-03-28T14:53:02 1743173582

If it didn't officially come from the marketing department it's only sparkling overhype right?

hansmayer · 2025-03-28T10:19:53 1743157193

Elon? Never did, and for the record, also never really understood his fanboys. I never even bought a Tesla. And no, besides these two guys, I don´t really remember many other CEOs making such revolutionary statements. That is usually the case when people understand their technology and are not ready to bullshit. There is one small differentiation though: At least self-driving cars hype was believable because it seemed almost like a finite-search problem, like along the lines of, how hard could it be to process X input signals from lidars and image frames and marry it to an advanced variation of what is basically a PID controller. And at least there is a defined use-case. With genAI, we have no idea what the problem definition and even problem space is, and the main use-case that the companies seem to be pushing down our throats (aside from code assistants) is "summarising your email" and chatting with your smartphone, for lonely people. Ew, thanks, but no thanks.

lnenad · 2025-03-28T12:31:26 1743165086

I mean you really don't know multiple CEOs in jail that hyped their stock to the moon? Theranos? Nikola?

That's reallyyyy trying hard to minimise the capability of LLMs and their potentials that we're still discovering. But you do you I guess.

hansmayer · 2025-03-28T12:44:29 1743165869

No mate, not everyone is trying hard to prove some guy on the Internet wrong. I do remember these two but to be honest, they were not on top of my mind in this context, probably because it's a different example - or what are you trying to say? That the people running AI companies should go to jail for deceiving their investors? This is different to Theranos. Holmes actively marketed and PRESENTED a "device" which did not exist as specified (they relied on 3rd party labs doing their tests in the background). For all that we know, OpenAI and their ilk are not doing that really. So you're on thin ice here. Amazon came close though, with their failed Amazon Go experiment, but they only invested their own money, so no damage was done to anyone. In either case your example is showing what? That lying is normal in the business world and should be done by the CEOs as part of their job description? That they should or should not go to jail for it? I am really missing your point here, no offence.

lnenad · 2025-03-28T13:23:18 1743168198

No offense taken

> In either case your example is showing what? That lying is normal in the business world and should be done by the CEOs as part of their job description? That they should or should not go to jail for it? I am really missing your point here, no offence.

If you run through the message chain you'll see first that the comment OP is claiming companies market llms as AGI, and then the next guy quotes Altmans tweet to support it. I am saying companies don't claim llms are AGI and that CEOs are doing CEO things; my examples are Elon (didn't go to jail btw) and the other two that did.

> For all that we know, OpenAI and their ilk are not doing that really.

I am on the same page here.

MindBeams · 2025-03-28T23:45:55 1743205555

CEOs represent their companies. "The company didn't say it, the CEO did" is a nonsensical distinction.

hansmayer · 2025-03-28T13:37:52 1743169072

I think you completely missed the point. Altman is definitely engaging in 'creative' messaging, so do other GenAI CEOs. But unlike Holmes and others, they are careful to wrap it into conditionals and future tense and this vague corporate speak about how something "feels" like this and that and not that it definitely is this or that. Most of us dislike the fact that they are indeed implying this stuff as being almost AGI, just around the corner, just a few more years, just a few more hundred billion dollars wasted in datacenters. When we can see on a day-to-day basis, that their tools are just advanced text generators. Anyone who finds them 'mindblowing' clearly does not have a complex enough use case.

lnenad · 2025-03-28T14:11:35 1743171095

I think you are missing the point. I never said it's the same nor is that what I am arguing.

> Anyone who finds them 'mindblowing' clearly does not have a complex enough use case.

What is the point of llms? If their only point is complex use cases then they're useless, let's throw them away. If their point/scope/application is wider and they're doing something for a non negligible percentage of people then who are you to gauge whether they deserve to be mindblowing to someone or not regardless of their use case?

hansmayer · 2025-03-28T22:24:37 1743200677

What is the point of LLMs? It seems nobody really knows, including the people selling them. They are a solution in search of a problem. But if you figure it out in the meanwhile, make sure to let everyone know. Personally I'd be happy with just having back Google as it was between roughly 2006-2019 (RIP) in the place of the overly verbose statistical parrots.

re-thc · 2025-03-28T10:28:45 1743157725

> Out of the last 100 years how many inventions have been made that could make any human awe like llms do right now?

Lots e.g. vacuum cleaners.

> But at no point in history could you talk to your phone

You could always "talk" to your phone just like you could "talk" to a parrot or a dog. What does that even mean?

If we're talking about LLMs, I still haven't been able to have a real conversation with 1. There's too much of a lag to feel like a conversation and often doesn't reply with anything related.

hansmayer · 2025-03-28T11:52:25 1743162745

Right on the money. Plus vacuum cleaners are actually useful and predictable in their inputs and outputs :)

lnenad · 2025-03-28T12:33:39 1743165219

Sure, a vacuum cleaner is the same.

> If we're talking about LLMs, I still haven't been able to have a real conversation with 1. There's too much of a lag to feel like a conversation and often doesn't reply with anything related.

I don't believe this one bit. But keep on trucking.

re-thc · 2025-03-28T13:07:10 1743167230

> Sure, a vacuum cleaner is the same.

> I don't believe this one bit. But keep on trucking.

You sure? Isn't that contradictory? It can't be the same if you don't believe it...

lnenad · 2025-03-28T14:14:15 1743171255

Did you need an /s to understand sarcasm?

jibal · 2025-03-28T11:10:13 1743160213

Of course they aren't "real" conversations but I can dialog with LLMs as a means of clarifying my prompts. The comment about parrots and dogs is made in bad faith.

hansmayer · 2025-03-28T22:28:13 1743200893

By your own admission, those are not dialogues, but merely query optimisations in an advanced query language. Like how you would tune an SQL query until your get the data you are expecting to see. That's what it is for the LLMs.

jibal · 2025-04-05T10:43:14 1743849794

Point and context completely missed ... and this is a radical misrepresentation of the process.

mdp2021 · 2025-03-28T12:48:33 1743166113

> The comment about parrots and dogs is made in bad faith

Not necessarily. (Some aphonic, adactyl downvoters seem to have possibly tried to nudge you into noticing that your idea above is against some entailed spirit the guidelines.)

The poster may have meant that for the use natural to him, he feels in the results the same utility of discussing with a good animal. "Clarifying one's prompts" may be effective in some cases, but it's probably not what others seek. It is possible that many want the good old combination of "informative" and "insightful": in practice there may be issues with both.

re-thc · 2025-03-28T13:14:33 1743167673

> "Clarifying one's prompts" may be effective in some cases but it's probably not what others seek

It's not even that. Can the LLM run away, stop the conversation or even say no? It's as much as your boss "talking" to you about the task and not giving you a chance to respond. Is that a talk? It's 1-way.

E.g. ask the LLM who invented Wikipedia. It will respond with "facts". If I ask a friend, the reply might be "look it up yourself". This a real conversation. Until then.

Even parrots and dogs can respond differently than a forced reply exactly how you need it.

quonn · 2025-03-28T15:39:40 1743176380

True - but LLMs can do this.

A German Onion-like magazine has a wrapper around ChatGPT that behaves like that called „DeppGPT“ (IdiotGPT), likely implemented with a decent prompt.

jibal · 2025-04-05T10:58:31 1743850711

If you have something to say, just say it directly and clearly.

And the poster clearly did not mean what you say he "may have meant".

TeMPOraL · 2025-03-28T11:25:32 1743161132

> If we're talking about LLMs, I still haven't been able to have a real conversation with 1. There's too much of a lag to feel like a conversation

Imagine the LLM is halfway through its journey to the Moon, and mentally correct for ~1.5 seconds of light lag.

> and often doesn't reply with anything related.

Use better microphone, or stop mumbling.

sevensor · 2025-03-28T12:22:20 1743164540

> This is your interpretation of what these companies are saying. I'd love to see if some company specifically anything like that?

What is the layman to make of the claim that we now have “reasoning” models? Certainly sounds like a claim of human-like cognition, even though the reality is different.

brookst · 2025-03-28T12:26:48 1743164808

Studies have shown that corvids are capable of reasoning. Does that sound like a claim of human level cognition?

I think you’re going too far in imagining what one group of people will make of what another group of people is saying, without actually putting yourself in either group.

noufalibrahim · 2025-03-28T08:32:27 1743150747

Much as i agree with the point about overhyping from companies, I'd be more sympathetic to this point of view if she acknowledged the merits of the technology.

Yes, it hallucinates and if you replace your brain with one of these things, you won't last too long. However, it can do things which, in the hands of someone experienced, are very empowering. And it doesn't take an expert to see the potential.

As it stands, it sounds like a case of "it's great in practice but the important question is how good it is in theory."

nottorp · 2025-03-28T09:19:02 1743153542

If it works for you...

I use LLMs. They're somewhat useful if you're on a non niche problem. They're also useful instead of search engines, but that's because search has been entshittified more than because a LLM is better.

However 90% of the marketing material about them is simply disgusting. The bigwigs sound like they're spreading a new religion, and most enthusiasts sound like they're new converts to some sect.

If you're marketing it as a tool, fine. If you're marketing it as the third and fourth coming of $DEITY, get lost.

nulbyte · 2025-03-28T11:46:34 1743162394

> I use LLMs. They're somewhat useful if you're on a non niche problem. They're also useful instead of search engines...

The problem for me is that I could use that type of assistance precisely when I hit that "niche problem" zone. Non-niche problems are usually already solved.

Like search. Popular search engines like Google and Bing are mostly garbage because they keep trying to shove gen AI in my face with made up answers. I have no such problems with my SearxNG instance.

nottorp · 2025-03-28T12:40:06 1743165606

> I could use that type of assistance precisely when I hit that "niche problem" zone

Tough luck. On the other hand, we're still justified in asking for money to do the niche problems with our fleshy brains, right? In spite of the likes of Altman saying every week that we'll be obsoleted in 5 years by his products. Like ... cold fusion? Always 5 years away?

[I have more hope for cold fusion than these "AIs" though.]

> Popular search engines like Google and Bing are mostly garbage because they keep trying to shove gen AI in my face with made up answers.

No they became garbage significantly before "AI". Google at least has gradually reduced the number of results returned and expanded the search scope to the point that you want a reminder of the i2c api syntax on a raspberry pi and they return 20 beginner tutorial results that show you how to unpack the damn thing and do the first login instead.

noufalibrahim · 2025-03-28T11:25:45 1743161145

I completely agree about the marketing material. I'm not sure about 90% but that's not something I have a strong opinion on. The stream from the bigwigs is the same song being played in a different tune and I'm inoculated to it.

I'm not marketing it. I'm not a marketer. I'm a developer trying to create an informed opinion on its utility and the marketing speak you criticize is far away from the truth.

The problem is this notion that it's just completely bullshit. The way it's worded irks me. "I genuinely don't understand...". It's quite easy to see the utility and acknowledging that doesn't, in any way, detract from valid criticisms of the technology and the people who peddle.

brookst · 2025-03-28T12:30:14 1743165014

Exactly. It’s so strange to read so many comments that boil down to “because some marketing people are over-promising, I will retaliate by choosing to believe false things”

nottorp · 2025-03-28T12:38:09 1743165489

Don't forget that before "AI" there was crypto. Same attitude among their promoters.

So someone who isn't already invested can genuinely draw the conclusion they're similar and not worth the time.

Edit: oh wait

>because some marketing people are over-promising

all "AI" marketing people that I've seen. Ok 98%. And all my LLM info is from what gets posted on HN.

> I will retaliate by choosing to believe false things

"I will retaliate by cataloguing them as pathological liars and not waste my time with them any more".

brookst · 2025-03-28T17:30:59 1743183059

But it’s not the marketers building the products. This is like saying “because the car salesman lied about this Prius’ gas mileage, I’ll retaliate by refusing to believe hybrids are any better than pure ICE cars and will buy a pickup”.

It hurts nobody but the person choosing ignorance.

nottorp · 2025-03-28T18:21:30 1743186090

No, I'm afraid it's like 90% of the Prius owners who post something about their car post fake gas mileages.

And 100% of the marketers of course.

brookst · 2025-03-29T20:45:27 1743281127

I disagree, but even conceding the point — why would that make you choose to believe falsehoods just to stick it to the liars?

TeMPOraL · 2025-03-28T11:46:06 1743162366

I hate to bring an ad hominem into this, but Sabine is a YouTube influencer now. That's her current career. So I'd assume this Tweet storm is also pushing a narrative on its own, because that's part of doing the work she chose to do to earn a living.

Pinch of salt & all.

ben_w · 2025-03-28T18:20:23 1743186023

While true, I think this is more likely a question of framing or anchoring — I am continuously impressed and surprised by how good AI is, but I recognise all the criticisms she's making here. They're amazing, but at the same time they make very weird mistakes.

They actually remind me of myself, as I experience being a native English speaker now living in Berlin and attempting to use a language I mainly learned as an adult.

I can often appear competent in my use of the language, but then I'll do something stupid like asking someone in the office if we have a "Gabelstapler" I can borrow — Gabelstapler is "forklift truck", I meant to ask for a stapler, which is "Tacker" or "Hefter", and I somehow managed to make this mistake directly after carefully looking up the word. (Even this is a big improvement for me, as I started off like Officer Crabtree from Allo' Allo').

mdp2021 · 2025-03-28T13:08:38 1743167318

What you have done there is to discount statements that may build up a narrative - and still may remain fair... On which basis? Possibly they do not match your own narrative?

tim333 · 2025-03-28T14:22:00 1743171720

LLMs seem akin to parts of human cognition, maybe the initial fast thinking bit when ideas pop up in a second of two. But any human writing a review with links to sources would look them up and check the are they right ones that match the initial idea. Current LLMs don't seem to do that, at least the ones Sabine complains about.

Akin to human cognition but still a few bricks short of a load, as it were.

brookst · 2025-03-28T12:24:53 1743164693

You lay the rhetoric on so thick (“cash heads”, “pushing the narrative”, “corporate overlords”, “profit-making abomination”) that it’s hard to understand your actual claim.

Are you trying to say that LLMs are useful now but you think that will stop being the case at some point in the future?