If I'm tired of one thing related to AI/llm/chatbots it's the claims that it's n...

mynameisash · 2025-10-17T17:42:35 1760722955

I don't think I've ever seen anyone say they're not useful. Rather, they don't appear to live up to the hype, and they're sure as hell not a panacea.

I'm pretty bearish on LLMs. I also think they're over-hyped and that the current frenzy will end badly (global economically speaking). Than said, sure, they're useful. Doesn't mean they're worth it.

Agingcoder · 2025-10-17T17:52:26 1760723546

To some extent it’s not that they don’t live up to the hype - rather that the gains are hard to measure.

Llms have spared me hours of research on exotic topics actually useful for my day job However, that’s the whole problem - I don’t know how much.

If they had a real price ( accounting for OpenAI losses for example) with ChatGPT at 50 usd/month for everyone, OpenAI being profitable, and people actually paying for this, I think things might self adjust and we’d have some idea.

Right now, we live in some kind of parallel world.

mattlutze · 2025-10-17T18:32:53 1760725973

> exotic topics [...] I don't know how much

We also don't know, in situations like this, whether all of or how much of the research is true. As has been regularly and publicly demonstrated [0][1][2], the most capable of these systems still make very fundamental mistakes, misaligned to their goals.

The LLMs really, really want to be our friend, and production models do exhibit tendencies to intentionally mislead when it's advantageous [3], even if it's against their alignment goals.

0: https://www.afr.com/companies/professional-services/oversigh... 1: https://www.nbcnews.com/world/australia/australian-lawyer-so... 2: https://calmatters.org/economy/technology/2025/09/chatgpt-la... 3: https://arxiv.org/pdf/2509.18058?

dangus · 2025-10-17T19:42:25 1760730145

Despite those mistakes, the utility is undeniable.

I converted some tooling from bash scripts leveraging the AWS CLI to a Go program leveraging the AWS SDK, improving performance, utility, and reliability.

I did this in less than two days and I don’t even know how to write Go.

Yes, it made some mistakes, but I was able to correct them easily. Yes, I needed to have general programming knowledge to correct those mistakes.

But overall, this project would not exist without AI. I wouldn’t have had the spare time learn all I needed to learn (mostly boilerplate) and to implement what I wanted to do.

autoexec · 2025-10-17T18:59:14 1760727554

> The LLMs really, really want to be our friend

They want you to think they are your friend but they actually want to be your master and steal your personal data. It's what the companies who want to be masters over you and the AI have programed them to do. LLMs want to gain your confidence, and then your dependence, and then they can control you.

dangus · 2025-10-17T19:38:25 1760729905

This seems hyperbolic to me. Sometimes companies just want to make money.

Similarly, a SaaS company that would very much prefer you renew your subscription isn’t trying to make you into an Orwellian slave. They’re trying to make a product that makes me want to pay for it.

100% of paid AI tools include the option to not train on your data, and most free ones do as well. Also, AI doesn’t magically invalidate GDPR.

ToucanLoucan · 2025-10-17T20:29:40 1760732980

> This seems hyperbolic to me. Sometimes companies just want to make money.

It's not hyperbolic at all. The entire moat is brand lock-in. OpenAI owns the public impression of what AI is- for now- with a strong second place going to Claude for coders in specific. But that doesn't change that ChatGPT can generate code too, and Claude can also write poems. If you can't lock users into good experiences with your LLM product, you have no future in the market, so data retention and flattery are the names of the game.

All the transformer-based LLMs out there can all do what all the other ones can do. Some are gated off about it, but it's simulated at best. Sometimes even circumvent-able with raw input. Twitter bots regularly get tricked into answering silly prompts by people simply requesting they forget current instructions.

And, between DeepSeek's incredibly resource-light implementations of solid if limited models, which do largely the same sort of work without massive datacenters full of GPUs, plus Apple Intelligence rolling out experiences that largely run on ML-specific hardware in their local devices which immediately, full stop, wins the privacy argument, OpenAI and co are either getting nervous, or they're in denial. The capex for this stuff, the valuations, and the actual user experiences are simply not cohering.

If this was indeed the revolution the valley said it was, and the people were lining up to pay prices that reflected the cost of running this tech, then there wouldn't be a debate at all. But that's simply not true: most LLM products are heavily subsidized, a lot of the big players in the space are downsizing what they had planned to build out to power this "future," and a whole lot of people cite their experiences as "fine." That's not a revolution.

autoexec · 2025-10-17T22:32:25 1760740345

> Sometimes companies just want to make money.

Companies never just want money, because more power means more money. Regulatory capture means more money. More control means more money. Polluting the environment and wasting natural resources means more money. Exploiting workers means more money. Their endless lust for money causes them want all sorts of harmful things. If companies were making billions and nothing was being actively harmed by any of it no one would care.

These companies do want your money, but once you're locked in you are no longer the customer. If these AI companies had to depend on the income they get from subscriptions to survive they'd have gone out of business years ago. Instead AI is just shoved down people's throats everywhere they look and the money these companies live off of is coming from investors who are either praying that the AI becomes something it isn't or they're hoping they can help drive up stock value and cash out before the bubble breaks and leave somebody else holding the bag.

0% of AI tools include the option to not train on my data. They've already stolen it. They've scraped every word and line of code I've ever written that's been transmitted over the internet. It's been trained on photos of my family. It's been trained on the shitty artwork I've sent to my friends. By now it's probably been trained on my medical information and my tax records.

AI is controlled by some of the most untrustworthy companies and people on earth who have been caught over and over lying to the public and breaking the law. They can promise all day long not to steal anything I voluntarily give them, but I have zero trust in them and there is no outside oversight to ensure that they will do what they say.

The people behind what passes for AI don't give a shit about you beyond whatever they can take from you. They are absolutely not your friend. AI is incapable of being your friend. It's just a tool for the people who control it.

jjav · 2025-10-18T07:32:09 1760772729

> 0% of AI tools include the option to not train on my data.

That's perhaps not true. If you sign up for the enterprise accounts there are options to not use any of your data to train. That's how we have it set up at $job.

(I say "perhaps" because of course I'm still sending all the data to the AI and while the contract has an ironclad clause that they won't use it, there's no way to 100% verify that.)

tacet · 2025-10-18T09:37:28 1760780248

They mean the data AI companies scrape(d) to train their models.

For example they can't opt for their comment not to be scraped off HN and used for training.

dangus · 2025-10-19T14:33:55 1760884435

I feel like you’re still using hyperbole here. For example, you said your family photos were used for training, but most cloud photo providers specifically tell you in their privacy policies (legally binding) that they don’t do that.

My family photos have never trained AI, because my iCloud Photos service specifically says they don’t do that and explains the technical implementation of their object recognition system in detail. Apple even offers an e2e encrypted mode of operation. (Still, I have now moved to a more customer-friendly solution away from iCloud).

As far as training on your code, well, you either believe in open source or you don’t. AI training doesn’t even violate the most copyleft open source licenses. Unless AI has reproduced your code verbatim it’s not engaging in any kind of copyright reproduction.

marcellus23 · 2025-10-17T20:49:21 1760734161

I wonder if, in any of those legal cases, the users turned on web search or not. We just don't know -- but in my experience, a thinking LLM with web search on has never just hallucinated nonexistent information.

andrepd · 2025-10-17T21:08:31 1760735311

I'm sorry to be so blunt but this is a massive cope and deeply annoying to see this every. fucking. time. the limitations of LLMs are brought up. There is every single time someone saying yeah you didn't use web search / deep thinking / gpt-5-plus-pro-turbo-420B.

It's absurd. You can trivially spend 2 minutes on chatgpt and it will hallucinate on some factually incorrect answer. Why why why always this cope.

andybak · 2025-10-17T21:53:02 1760737982

The 2 minutes thing feels out by one or two orders of magnitude.

But then - I'm constantly amazed by how everyone's subjective and presumably honest accounts of their experiences with AI differ so wildly.

andrepd · 2025-10-18T17:30:34 1760808634

You're right, it does. It's more like 10 seconds.

1718627440 · 2025-10-17T22:06:11 1760738771

Well I agree with you that LLMs really like to answer with stuff, that is not grounded in reality, but I also agree with the parent, that grounding it in something else absolutely helps. I let the LLM invent garbage how ever it feels like, but then tell it to only ever answer with a citing valid existing URLs. Suddenly it generates claims, that something doesn't exist or it truly doesn't know.

This really results in zero hallucination (but the content is also mostly not generated by a LLM).

andrepd · 2025-10-18T22:32:01 1760826721

Well I don't know what to say, except that this is obviously, trivially, not true. The LLM will plain make up links that don't exist, or at least "summarise" an existing link by just making stuff up that is tangentially (but plausibly) related to the link. It's impossible to have used LLMs for this purpose for more than a quarter of an hour and not have seen this.

1718627440 · 2025-10-19T14:59:54 1760885994

I never had the case, that an URL did not exist. For me it shows stuff like "generating web search", so I guess it tries to fetch the URL first, before suggesting it. LLMs like to give tangentially related links, but this is typically paired with a sentence, that the link I really asked for, does not exist.

> It's impossible to have used LLMs for this purpose for more than a quarter of an hour and not have seen this.

You may be generalizing too much from your experience.

noosphr · 2025-10-17T19:57:45 1760731065

>We also don't know, in situations like this, whether all of or how much of the research is true.

That's perfectly fine since we don't know how much of the original research is true either: https://en.wikipedia.org/wiki/Replication_crisis

If I waste three months doing a manual literature review on papers which are fraudulent with 100% accuracy have I gained anything compared to doing it with an AI in 20 minutes with 60% accuracy?

Jensson · 2025-10-17T20:41:37 1760733697

> If I waste three months doing a manual literature review on papers which are fraudulent with 100% accuracy have I gained anything compared to doing it with an AI in 20 minutes with 60% accuracy?

You don't see how adding 40% error rate on top of that makes things worse? Your 20 minute study there made you less informed, not more, at least the fraudulent papers teaches you what the community thinks about the topic while the AI just misinforms you about the world in your example.

For example, while reading all those fraudulent papers you will probably discover that they don't add up and thus figure out that they are fraudulent. The AI study however will likely try to connect the data in those so they make sense (due to how LLM works, it has seen more examples that connect and make sense than not, so hallucinations will go in that direction) then the studies will not seem as fraudulent as they actually are and you might even miss the fraud entirely due to AI hallucinating arguments in favor of the studies.

noosphr · 2025-10-17T20:47:52 1760734072

You are assuming my time is free and then comparing the difference between spending infinite time on something and minimal time.

Jensson · 2025-10-17T21:06:06 1760735166

Uninformed is better than misinformed, its better to not do that research at all than having such a high error rate as your example had. AI models often have much less error rate than you said there for certain topics, but the 40% error rate in your example does firmly put it where you are better off doing nothing at all than using that for research.

alganet · 2025-10-17T18:08:22 1760724502

> I don’t know how much.

If you're not willing to measure how it helps you, then it's probably not worth it.

I would go even further: if the effort of measuring is not feasible, then it's probably not worth it.

That is more targeted at companies than you specifically, but it also works as an individual reflection.

In the individual reflection, it works like this: you should think "how can I prove to myself that I'm not being bamboozled?". Once you acquire that proof, it should be easy to share it with others. If it's not, it's probably not a good proof (like an anecdote).

I already said this, and I'll say it again: record yourself using LLMs. Then watch the recording. Is it that good? Notice that I am removing myself from the equation here, I will not judge how good is it, you're going to do it yourself.

wongarsu · 2025-10-17T19:21:57 1760728917

There is a difference between confirming that something is worth it and quantifying the benefit though. One only requires satisfying a lower bound, the other requires an exact number.

For example I use a $30/month chatbot subscription for various utility tasks. If I value my time at above $60/hour I need to save half an hour each month (a minute a day) to make the investment worth it. That is absolutely true, just with simple googleable questions and light research tasks I save much more than 7 minutes a week.

But how much do I actually save? What exactly is my time actually worth? Those are much more difficult questions to answer

alganet · 2025-10-17T20:40:13 1760733613

The user @brailsafe gave an answer that embodies some things I was going to say.

You're accounting for the time wins, not accounting for the time losses.

For a human chat user, that's when the LLM fails an answer or answers wrong. For an LLM coder, that's when context rot creeps in and you have to restart your work, and so on.

There are people who don't care much if they are being bamboozled for $30/mo, they have nothing to prove nor grand expectations for the thing. To them, cargo culting might be fun and that's what they extract from the bargain.

I am directing my answers mostly to people, companies or individuals, who have something to prove (evangelists, AI companies, etc). To those, a series of imperceptible small losses that results in debt in the long run is a big problem.

My suggestion (the recording session) also works as a metaphor. That could be, instead of video, metrics about how contexts are discarded. It is, in that sense, also something they can decide to share or not, and the extent to what they share should be a sign of confidence in their product.

Makes sense?

brailsafe · 2025-10-17T20:10:05 1760731805

For me it's less important how much time I think I save on any discrete task, and how much time I net over that time, accounting for how much time the set of tasks I'd be working on would have otherwise taken had I just manually done them. Right now, that means debits and credits in the ledger of time. Sometimes I gain a lot on tasks I probably otherwise wouldn't have done, but I also don't gain much overall by doing, and sometimes I lose a ton of time simply by leaning on a loop of re-doing inaccurate agent work in a way that's actually more time intensive than had I internalized the system in working memory and produced functionality more slowly.

If I save an hour, but lose 6, when I'd otherwise have spent 2, then I net -4, but sometimes overall it's positive, so the value is more ambiguous. If my employer didn't pay for the tools, I really don't know whether I would.

A good price and conservative usage pattern might net more.

aeon_ai · 2025-10-17T18:30:22 1760725822

I just did it.

You were right.

It is, in fact, that good.

alganet · 2025-10-17T18:45:10 1760726710

You could have recorded, found it to be good, and didn't shared the news. Only used for your self. But you decided to share only the news, not the recording. That tells me something.

To be more clear, I can move this argument further. I promise you that if you share the recording that led you to believe that, I will not judge it. In fact, I will do the opposite and focus on people who judge it, trying my best to make the recording look good and point out whoever is nitpicking.

1vuio0pswjnm7 · 2025-10-18T05:05:27 1760763927

"I don't think I've ever seen anyone say they're not useful."

That's because no one has said that

"AI" hype is the issue, not "AI"

The hype machine and its followers have no tolerance for skepticism

Any perceived skepticism of "AI", no matter how reasonable, triggers absurd accusations

The author, like many others, tries to avoid the kneejerk defensiveness of "AI" hype subscribers:

"Don't get me wrong: I am not denying the extraordinary potential of AI to change aspects of our world, nor that savvy entrepreneurs, companies and investors will win very big. It will - and they will."

But this does not work. There is zero tolerance for skepticism. All disbelief must be countered

"Crypto" hype was like this, before one of its ringleaders went to prison

It's unlikely that fraud will be prosecuted under current political environment

Fasten your seatbelts

James_K · 2025-10-17T18:06:56 1760724416

Something not being useful is distinct from it having no uses. It could well be the case that the use of AI creates more damage than it does good. Many people have found it a useful tool to create the appearance of work where none is happening.

thatjoeoverthr · 2025-10-17T19:37:02 1760729822

The thing with the hype is it's always the same hype. "If you can just 3D print another 3D printer ..." "Apps are dead, everything will be AJAX" etc. I no longer believe the hype itself warrants attention or pushback. Let the hype boys raise money. No need to protect naive VCs.

watwut · 2025-10-17T21:52:12 1760737932

> Let the hype boys raise money. No need to protect naive VCs.

I genuinely 100% believe that ability of hype boys to raise money is harming the economy and us all. Whatever structural reason for it existing is there, it would be the best to end it.

bigfishrunning · 2025-10-17T20:40:21 1760733621

But if the hype boys manage to capture big portions of the market (Microsoft, Amazon, etc...) it starts affecting pensions and retirement accounts. The next few years are gonna be rough because of this hype.

QuantumGood · 2025-10-17T19:29:50 1760729390

Those that claim not useful usually link it to something like "never trust because hallucinations", or backtrack when called out like "yes, I should have added details", or speak of problems outweighing usefulness hence not useful, etc. But online, people do make this statement.

llm_nerd · 2025-10-17T20:23:12 1760732592

>I don't think I've ever seen anyone say they're not useful.

https://news.ycombinator.com/item?id=45577203

There are thousands and thousands of comments just like this on this site. I would dare say tens of thousands. They regularly appear in any AI-related discussion.

I've been involved in many threads on here where devs with Very Important Work announce that none of the AI tools are useful for them or for anyone with Real Problems, and at best they work for copy/paste junior devs who don't know what they're doing and are doing trivial work. This is right after they declare that anyone that isn't building a giant monolithic PHP app just like them are trend-chasers who are "cargo culting, like some tribe or something".

>I also think they're over-hyped and that the current frenzy will end badly (global economically speaking)

In a world where Tesla is a trillion dollar company based upon vapourware, and the president of largest economy (for now) is launching shitcoins and taking bribes through crypto, and every Western country saw a massive real-estate ramp up by unmetered mass migration, and Bitcoin is a $2T "currency" that has literally zero real world use beyond betting on itself, and sites like Polymarket exist for insiders to scam foolish rube outsiders out of their money, and... Dude, the AI bubble doesn't even remotely measure.

tra3 · 2025-10-17T17:50:16 1760723416

Fair enough, I may have conflated "there's an AI bubble" with "AIs aren't useful".

My employer pays for Claude pro access, and if they stopped paying tomorrow I'd consider paying for it myself. Although, it's much more likely for me to start self hosting them.

So that's what it's worth to me, say $2500 USD in hardware over the next 3 years.

I'd love to hear what your take on this is.

brailsafe · 2025-10-17T20:04:52 1760731492

$2500 is a relatively small investment for any sort of useful tool over 3 years, but that seems very low to me for this specific self-hosting endeavor

peteforde · 2025-10-17T20:19:36 1760732376

My dude, there is a small but weirdly dedicated group of people on this site that are hellbent on demanding "proof" that the wins we've personally gained from using LLMs in an intelligent way are real. It's actually been kind of exhausting, leading me to not weigh in on many threads.

hatthew · 2025-10-17T23:07:44 1760742464

Because there's a lot of evidence that people tend to overestimate/overstate how useful LLMs are.

Everyone says "I wrote this thing using AI" but most of the time reading the prompt would be just as useful as reading the final product.

Everyone says "I wrote this large codebase using AI" but most of the time the code is unmaintainable and probably could have been implemented with much less code by a real human, and also the final software isn't actually ready for prod yet.

Everyone says "I find AI coding very useful" and neglects to mention that they are making small adhoc scripts, or they're in a domain that's mostly boilerplate anyways (e.g. some parts of web dev).

The one killer application of LLMs seems to be text summarization. Everything else that I have seen is either a niche domain that doesn't apply to the vast majority of people, a final product that is slop and shouldn't been made in the first place, or minor gains that are worthwhile but nowhere near as groundbreaking as people claim.

To be clear, I think LLMs are useful, and I personally use them regularly. But I've gained at most 5% productivity from them (likely much less). For me, it's exhausting to keep on trying to realize these gains everyone is talking about, while every time I dig into someone claiming to get massive gains I find that the actual impact is highly questionable.

peteforde · 2025-10-19T01:16:36 1760836596

Your position implies that we need to prove that we're not smoking our own supply. I would argue that you are the one who should prove that we're not working (conservatively) 5-8x faster.

The most telling part is when you said "most of the time reading the prompt". That strongly implies that you're attempting to one-shot whatever it is that you're working on.

There is no "the prompt" in my current application. It's a 275k LoC ESP-IDF app spread across ~30 components that interact via FreeRTOS mechanisms as well as an app-wide event bus. It manages non-blocking UI, IO over multiple protocols, drives an OLED using a customized version of lvgl. It is, by any estimation, a serious and non-trivial application, and it was almost entirely crafted by LLM coding models being closely driven by yours truly across several hundred distinct Cursor conversations.

It's probably taken me 10% of the time it would have taken me to do by hand, and that's precisely because I lean on it so heavily for initial buildout, thoughtful troubleshooting (it is never tired, never not available, and also knows more than I do about electronics as a bonus) and the occasional large cross-component refactor.

I don't suspect that you're wrong. I know that you're wrong.

didibus · 2025-10-17T17:50:37 1760723437

> it's the claims that it's not useful

I think the reason is because it depends what impact metrics you want to measure. "Usefulness" is in the eye of the beholder. You have to decide what metric you consider "useful".

If it's company profit for example, maybe the data shows it's not yet useful and not having impact on profit.

If it's the level of concentration needed by engineers to code, then you probably can see that metric having improved as less mental effort is needed to accomplish the same thing. If that's the impact you care about, you can consider it "useful".

Etc.

Octoth0rpe · 2025-10-17T18:17:39 1760725059

> It 100% is [useful]

It's worth disambiguating between "worth $50b of investment" useful versus "worth $1t of investment" useful

pseudosavant · 2025-10-17T18:46:06 1760726766

For perspective, there are 10 companies with a market cap over $1T. Is the value of LLMs greater than Tesla? Absolutely.

The problem of course is that plenty of that $1T in investment will go to stupid investments. The people whose investments pan out will be the next generation of Zuckerbergs. The rest will be remembered like MySpace or Webvan.

pseudosavant · 2025-10-17T19:49:42 1760730582

I'll add that MSFT, AAPL, GOOGL, AMZN, and META generated >$450B in net income in the last 4 quarters. It can't be overstated how much profits they can burn on AI without losing money.

rz2k · 2025-10-17T19:31:33 1760729493

To be fair, while the incremental value of each additional year that Tesla remains in existence may not be so great, it did finally change the conventional wisdom about the viability of electric vehicles which will continue to have substantial impact.

Furthermore the price of the most recently sold share times the number outstanding does not represent the total R&D or spending to make Teslas.

mattlutze · 2025-10-17T18:34:38 1760726078

Especially when, as it is currently in vogue to observe, the difference between $50b and $1t is roughly $1t.

marcosdumay · 2025-10-17T18:54:58 1760727298

Up to 2 significant figures...

criemen · 2025-10-17T19:11:40 1760728300

> Supposedly the performance of Owen-coder is comparable to the likes of Sonnet4. If I invest in a homelab that can host something like Qwen3 I'll recoup my costs in about 20 months without having to rely on Anthropic.

You can always try it via openrouter without investing in the home setup first. That allows you to evaluate whether it hits your quality bar or not, and is much cheaper. It is less fun than self-hosting though.

silversmith · 2025-10-17T17:46:58 1760723218

The issue is that the field is still moving too fast - in 20 months, you might break even on costs, but the LLMs you are able to run might be 20 months behind "state of the art". As long as providers keep selling cheap inference, I'm holding out.

ants_everywhere · 2025-10-17T18:38:14 1760726294

I agree, but also don't underestimate the value of developing a competency in self-hosting a model.

Dan Luu has a relevant post on this that tracks with my experience https://danluu.com/in-house/

qingcharles · 2025-10-19T21:16:12 1760908572

Agreed. Right now this stuff is being massively subsidized in our favor. I'm taking advantage of that.

tra3 · 2025-10-17T17:55:36 1760723736

That's where I am at too. Also it's not clear what's going to happen with hardware prices. I think there's a huge demand for hardware right now, but it should fall off at some point hopefully.

wmf · 2025-10-17T17:50:23 1760723423

The gap between local models and SOTA is around 6 months and it's either steady or dropping. (Obviously this depends on your benchmark and preferences.)

criddell · 2025-10-17T20:55:12 1760734512

Seriously? So I can run the best models from 2024 at home now?

For example, what would I need to run Open AI's o1 model from 2024 at home? Are there good guides for setting this up?

wmf · 2025-10-17T22:13:15 1760739195

It's not the same model, but for example GPT-OSS-120B is smarter than o1. The guide is buy 128 GB of VRAM then install LM Studio.

criddell · 2025-10-17T22:45:19 1760741119

An NVIDIA 5090 with 128 GB of VRAM is $13k. It doesn’t make any sense to run that at home when you can pay OpenAI $20 / month to use it (it would take more than 50 years to spend $13k at OpenAI this way).

So technically you might be able to run a six month old model at home, but it would be foolish to do so from a financial point of view.

Or is there a way to get 128 GB of VRAM for a lot less than that?

wmf · 2025-10-17T23:06:59 1760742419

Ryzen AI Max is $2,000, M4 Max is $3,500, and DGX Spark is $4,000. Still not really economically feasible but I see it as an insurance policy. And that's the most expensive local model; smaller models will run on any PC.

rz2k · 2025-10-17T19:39:38 1760729978

Fortunately the models are increasing in efficiency about as fast as they are increasing in performance, so your homelab surprisingly doesn’t become out of date as fast as you might expect. However, I expect there will also be very capable machines like 1TB or 2TB Mac Studio M5 or M6 Ultras within a year or two.

mrbungie · 2025-10-17T17:46:01 1760723161

It's hell useful, I use Cursor several times a week (and I'm not even working as a dev full time rn), and ChatGPT is my daily driver.

Yet, it's weird to me that we're 3 years into this "revolution" and I can't get a decent slideshow from an LLM without having to practically build a framework for doing so.

jacobr1 · 2025-10-17T17:58:11 1760723891

It is a focus, data, and benchmarking problem. If someone comes up with good benchmarks, which means having a good dataset, and gets some publicility around, they can attract the frontier labs attention to focus training and optimization effort on making the models better for that benchmark. This is how most the capabilities we have today have become useful. Maybe there is some emergent initial detection of utility, but the refinement comes from labs beating others on the benchmarks. So we need a slideshow benchmark and I think we'd see rapid improvement. LLMs are actually ok at a building html decks, not great, but ok. Enough so that if we there was some good objective criteria to tune things toward I think the last-mile kinks would get worked out (formats, object/text overlaps). the raw content is mainly a function of the core intelligence of model, so that wouldn't be impacted (if you get get it to build a good bullet-point markdown of you presentation today it would be just a good as a prezo, but maybe not as visually compelling as you like. Also this might need to be an agentic benchmark to allow for both text and image creation and other considerations like data sourcing. Which is why everyone doing this ends up building their own mini framework.

A ton of the reinforcement type training work really just aligning the vague commands a user would give to the same capability a model would produce with a much more flushed out prompt.

mrdependable · 2025-10-17T18:53:20 1760727200

They are useful, but I find it is only slightly more convenient than a Google search. Losing something like GPS on my phone would be a much bigger disruption to my life.

arjie · 2025-10-17T18:35:29 1760726129

I used Qwen3-480B-Coder with Cerebras and it was not very good for my use case. You can run these models online first to see if they will work for you. I recommend you try that first.

noosphr · 2025-10-17T20:04:41 1760731481

I had a hilarious exchange on here where I used an LLM to explain to a poster at length why they fundamentally didn't understand what I said. It did a bang up job. The poster, and a lot of other people, got mad I used AI and they still didn't understand my original post, or the AI explanation.

LLMs aren't terribly useful to people who fundamentally can't read. When those people can also type very fast you get the current situation.

Jensson · 2025-10-17T20:58:34 1760734714

> I used an LLM to explain to a poster at length why they fundamentally didn't understand what I said. It did a bang up job. The poster still didn't understand my original post.

It didn't do a bang up job if the poster still didn't understand you, so sorry this example doesn't prove what you think it does.

You have to measure actual results, your own take will always be biased so you can't say "I thought it was great but it didn't work" and expect people to get convinced by that.

Edit: And if that doesn't convince you, why not read what this AI has to say about your post, if you like them so much you should read this right and acknowledge you were wrong just like you expected those people to: https://chatgpt.com/s/t_68f2ae740f98819183539767b921965b

noosphr · 2025-10-17T22:30:29 1760740229

Claude, explain the fallacy fallacy to someone fond of pointing fallacies in arguments:

#### *The Fallacy Fallacy: A Metacognitive Error in Logical Analysis*

The fallacy fallacy, also known as the argument from fallacy or argumentum ad logicam, represents a second-order logical error wherein one incorrectly infers that a conclusion must be false solely because it has been argued through fallacious reasoning. This metacognitive error constitutes a significant impediment to rigorous philosophical discourse and warrants careful examination.

#### *Theoretical Framework and Definition*

Within the domain of informal logic, fallacies constitute "mistakes of reasoning, as opposed to making mistakes that are of a factual nature". The fallacy fallacy emerges when interlocutors conflate the validity of argumentative structure with the truth value of propositional content. Specifically, this error manifests when one advances the following invalid inference pattern:

1. Argument X contains logical fallacy F 2. Therefore, the conclusion C of argument X is false

This inference pattern itself represents a non sequitur, as the presence of fallacious reasoning does not necessarily bear upon the truth or falsity of the conclusion in question.

#### *Epistemological Implications*

The commission of the fallacy fallacy reveals a fundamental misunderstanding of the relationship between logical validity and factual accuracy. *Truth values of propositions exist independently of the quality of arguments marshaled in their support*. A proposition may be demonstrably true despite being defended through specious reasoning, just as a false proposition may be supported by formally valid argumentation with false premises.

Consider the following syllogistic example: - Major premise: All mammals are warm-blooded - Minor premise: Dogs are mammals because they bark - Conclusion: Dogs are warm-blooded

While the minor premise employs irrelevant reasoning (dogs' classification as mammals is unrelated to their vocalization), the conclusion remains factually correct.

#### *Methodological Considerations for Critical Analysis*

Scholars engaged in the identification of logical fallacies must exercise epistemic humility regarding the scope of their critique. As noted in the academic literature, "fallacies are common errors in reasoning that will undermine the logic of your argument", yet this undermining pertains exclusively to the argumentative structure rather than to the ontological status of the conclusion.

The appropriate scholarly response to encountering fallacious reasoning involves:

1. *Methodological separation* - Distinguishing between the evaluation of argumentative form and the assessment of propositional content 2. *Constructive engagement* - Requesting alternative justification rather than dismissing conclusions outright 3. *Epistemic charity* - Acknowledging that interlocutors may possess valid intuitions despite articulating them through flawed logical frameworks

#### *Conclusion*

The fallacy fallacy represents a particularly insidious form of intellectual error, as it masquerades as sophisticated logical analysis while itself committing a fundamental category mistake. Academics and scholars must remain vigilant against this metacognitive trap, recognizing that the identification of fallacious reasoning, while valuable for improving argumentative rigor, does not constitute sufficient grounds for rejecting the truth claims embedded within poorly constructed arguments. The pursuit of truth demands that we evaluate propositions on their merits, independent of the quality of their initial presentation.

yoz-y · 2025-10-18T06:24:53 1760768693

See the problem you encounter is that rather than simply saying “just because there was a fallacy in my argument it doesn’t make it false”, you pasted this long, barely digestible, block of text.

In the other conversation “you” “explain“ why other people are misunderstanding you, rather than try to make the argument clearer. Unfortunately the flagged post is no longer available but I’m curious of how such a simple conversation spiraled down so badly.

I even tend to agree with your initial argument so I’m quite convinced that none of this was necessary.

huevosabio · 2025-10-17T17:46:36 1760723196

The problem of self-hosting is that you increase the friction to swap models and use whatever is SOTA or whatever fits your purpose best.

Also, I've heard from others that the Qwen models are a bit too overfit to the benchmarks and that their real-life usage is not as impressive as they would appear on the benchmarks.

acutesoftware · 2025-10-18T00:06:20 1760745980

Switching models when running locally is fairly easy - as long as you have them downloaded you can switch them in and out with a just a config setting - cant quite remember, but you may need to rebuild the vectorstore when switching though.

LangChain has the embeddings for major providers:

  def build_vectorstore(docs):
    """
    Create vectorstore from documents using configured embedding model.
    """
    # Choose embedding model
    if cfg.EMBED_MODEL.lower() == "openai":
        embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    elif cfg.EMBED_MODEL.lower() == "huggingface":
        from langchain_community.embeddings import HuggingFaceEmbeddings
        embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    elif cfg.EMBED_MODEL.lower() == "nomic-embed-text":
        from langchain_ollama import OllamaEmbeddings
        embeddings = OllamaEmbeddings(model=cfg.EMBED_MODEL)

reissbaker · 2025-10-17T18:09:16 1760724556

Qwen3 Coder unfortunately isn't on par with Sonnet, no matter what the benchmarks say. GLM-4.6 does feel pretty competitive though.

You'll need a pretty expensive home lab to run it though... I'd be surprised if you could do it at long context with only 20 months of Sonnet usage.

dvfjsdhgfv · 2025-10-17T20:18:54 1760732334

> If I'm tired of one thing related to AI/llm/chatbots it's the claims that it's not useful.

That is the best example of straw argument I've seen this year. I enjoy reading discussions on LLMs and have seen a huge number of arguments, some reasonable and some ridiculous, but one thing I haven't seen is someone claiming that LLMs are not useful. We can discuss usefulness for a particular purpose, or the level of its fitness for it, but not the fact that millions of people find LLMs useful enough to pay for them.

imiric · 2025-10-17T18:04:15 1760724255

> If I'm tired of one thing related to AI/llm/chatbots it's the claims that it's not useful. It 100% is. We have to separate the massive financial machinations from the actual tech.

It's indisputable that the tech is and can be very useful, but it's also surrounded by a bubble of grifters and opportunists riding the hype and money train.

The sooner we start ignoring the "AI", "ASI", "AGI", anthropomorphization, and every other snake oil these people are peddling, the sooner we can focus on practical applications of the tech, which are numerous.

somewhereoutth · 2025-10-17T20:21:07 1760732467

> the claims that it's not useful

There are many credible claims that not only is it not useful, but that it is actually causing serious damage.

ants_everywhere · 2025-10-17T18:34:16 1760726056

The other thing that's tiring is talking about how AI is a bubble as if that's an indictment of AI.

Being a bubble is a statement about the value of the stock market, not about the technology. There was a dotcom bubble, but that does not mean the internet wasn't valuable. And if you bought at the top of the dotcom bubble you'd be much wealthier now than you were when you bought. But it would have taken you a significant time to break even.

decimalenough · 2025-10-17T19:48:56 1760730536

If you bought ETFs, yes, but not if you bought Pets.com and Yahoo.

ants_everywhere · 2025-10-17T20:10:13 1760731813

Right, which is a distinction that matters if you have a sensible view of what it means to be in a bubble.

But many people talking about AI being a bubble aren't trying to figure out which ticker is going to win in the long run, they're trying to convey a belief that AI is bogus altogether.

There's widespread agreement that nobody knows whether the AI valuations we see are right. What I'm saying is tiring is people who confuse that idea with an indictment of the technology.

zmmmmm · 2025-10-17T21:04:34 1760735074

> If I invest in a homelab that can host something like Qwen3 I'll recoup my costs in about 20 months without having to rely on Anthropic

For me it's equally that I don't trust any of these service providers to keep maintaining whatever service or model I'm relying on. Imagine if I build a whole entire process and then the bubble bursts and they either take away what I'm using or start charging outrageous amounts for it.

I feel we are well into the point where the base technology is useful enough and all the work is in how you implement and adapt it in to your process / workflow. A new model coming out that is 3% better is relatively meaningless compared to me figuring out how better to integrate what I already have which might give me a 20% bump for very little effort.

So at this point all I really want is stability in the tech so I can optimise everything else. Constant churn of hosted providers thrusting change at me every second day is actively harmful to my productive use of it at this point. Hence I want local models so I can just tune out the noise and focus on getting things done.

andrepd · 2025-10-17T21:02:56 1760734976

It might be "useful" as in "it has a non-zero number of use cases", and still being massively overhyped (orders of magnitude in my opinion).

I guess there are use cases for it, even if we discount undisputed net negatives like the proliferation of slop online, scam calls, deepfakes, etc. That doesn't mean it provides an amount of utility that justifies pivoting a significant portion of world capital and production towards that end.

It will never be AGI, by the way. We are way past the inflection point of the logistic curve, so this is more or less what it is.

KronisLV · 2025-10-18T12:10:42 1760789442

> Supposedly the performance of Owen-coder is comparable to the likes of Sonnet4. If I invest in a homelab that can host something like Qwen3 I'll recoup my costs in about 20 months without having to rely on Anthropic.

Presently, look up the Cerebra Coder subscription. It’s cut down my reliance on paying per token by about 80% due to the model being good for most development tasks and the rate limits are such that I never hit them per day, alongside being faster than anything else out there.

Lots of folks also just explore new models on OpenRouter as they come on, albeit they don’t seem to have caching support so it can get expensive.

Aside from that, self-hosting can be worth it but you need lots of memory and beefy compute to have good performance without quantizing things super far. There’s a really big difference between the 30B and 480B versions of Qwen Coder and while the smaller models are getting better, feels like there are diminishing returns there.

electroglyph · 2025-10-17T20:13:16 1760731996

you need at least an RTX 6000 pro, maybe 2 to run local models on that level. probably only worth it if you plan on doing other workloads like finetuning or generating a lot of synthetic data

Ferret7446 · 2025-10-18T01:44:29 1760751869

Does that include electricity and maintenance costs?

satisfice · 2025-10-18T00:47:42 1760748462

Few people say they are not useful. But when people like me say they aren’t reliable and worthy of trust, AI fanboys like to pretend we are saying something else.