The lede is being missed imo. gpt-oss:20b is a top ten model (on MMLU (right beh...

int_19h · 2025-08-05T23:45:59 1754437559

I tried 20b locally and it couldn't reason a way out of a basic river crossing puzzle with labels changed. That is not anywhere near SOTA. In fact it's worse than many local models that can do it, including e.g. QwQ-32b.

dragonwriter · 2025-08-06T07:49:10 1754466550

> In fact it's worse than many local models that can do it, including e.g. QwQ-32b.

I'm not going to be surprised that a 20B 4/32 MoE model (3.6B parameters activated) is less capable at a particular problem category than a 32B dense model, and its quite possible for both to be SOTA, as state of the art at different scale (both parameter count and speed which scales with active resource needs) is going to have different capabilities. TANSTAAFL.

__alexs · 2025-08-06T08:58:43 1754470723

Dear reader do not be alarmed. That rumbling sound you are hearing is simply the frantic movement of goal posts.

tomhow · 2025-08-06T13:51:53 1754488313

Please don't post snark like this on HN. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

whynotminot · 2025-08-06T12:32:26 1754483546

He’s saying there’s different goalposts at different model sizes. Is that unreasonable?

lannisterstark · 2025-08-06T15:44:27 1754495067

This isn't reddit.

robwwilliams · 2025-08-06T00:05:29 1754438729

Well river crossings are one type of problem. My real world problem is proofing and minor editing of text. A version installed on my portable would be great.

cosmojg · 2025-08-06T03:48:56 1754452136

Have you tried Google's Gemma-3n-E4B-IT in their AI Edge Gallery app? It's the first local model that's really blown me away with its power-to-speed ratio on a mobile device.

See: https://github.com/google-ai-edge/gallery/releases/tag/1.0.3

1123581321 · 2025-08-06T05:29:22 1754458162

Dozens of locally runnable models can already do that.

mark_l_watson · 2025-08-06T13:22:33 1754486553

Yes, I always evaluate models on my own prompts and use cases. I glance at evaluation postings but I am also only interested in my own use cases.

golol · 2025-08-06T06:41:48 1754462508

I heard the OSSmodels are terrible at anything other than math, code etc.

CMay · 2025-08-06T09:23:35 1754472215

The 20b solved the wolf, goat, cabbage river crossing puzzle set to high reasoning for me without needing to use a system prompt that encourages critical thinking. It managed it using multiple different recommended settings, from temperatures of 0.6 up to 1.0, etc.

Other models have generally failed that without a system prompt that encourages rigorous thinking. Each of the reasoning settings may very well have thinking guidance baked in there that do something similar, though.

I'm not sure it says that much that it can solve this, since it's public and can be in training data. It does say something if it can't solve it, though. So, for what it's worth, it solves it reliably for me.

Think this is the smallest model I've seen solve it.

mgoetzke · 2025-08-06T11:29:54 1754479794

Try changing the names of the objects. eg fox, hen, seeds for examples

aspect0545 · 2025-08-06T10:40:47 1754476847

But was it reasoning or did it solve this because it was parting it‘s training data?

CMay · 2025-08-06T18:08:36 1754503716

Maybe both? I tried using different animals, scenarios, solvable versions, unsolvable versions, it gave me the correct answer with high reasoning in LM Studio. It does tell me it's in the training data, but it does reason through things fairly well. It doesn't feel like it's just reciting the solution and picks up on nuances around the variations.

If I switch from LM Studio to Ollama and run it using the CLI without changing anything, it will fail and it's harder to set the reasoning amount. If I use the Ollama UI, it seems to do a lot less reasoning. Not sure the Ollama UI has an option anywhere to adjust the system prompt so I can set the reasoning to high. In LM Studio even with the Unsloth GGUF, I can set the reasoning to high in the system prompt even though LM Studio won't give you the reasoning amount button to choose it with on that version.

ben_w · 2025-08-06T11:09:35 1754478575

Allow me to answer with a rhetorical question:

S8O2bm5lbiBTaWUgZGllc2VuIFNhdHogbGVzZW4sIGRhIGVyIGluIEJhc2UtNjQta29kaWVydGVtIERldXRzY2ggdm9ybGllZ3Q/IEhhYmVuIFNpZSBkaWUgQW50d29ydCB2b24gR3J1bmQgYXVmIGVyc2NobG9zc2VuIG9kZXIgaGFiZW4gU2llIG51ciBCYXNlIDY0IGVya2FubnQgdW5kIGRhcyBFcmdlYm5pcyBkYW5uIGluIEdvb2dsZSBUcmFuc2xhdGUgZWluZ2VnZWJlbj8gV2FzIGlzdCDDvGJlcmhhdXB0IOKAnnJlYXNvbmluZ+KAnCwgd2VubiBtYW4gbmljaHQgZGFzIEdlbGVybnRlIGF1cyBlaW5lbSBGYWxsIGF1ZiBlaW5lbiBhbmRlcmVuIGFud2VuZGV0Pw==

And yes, that's a question. Well, three, but still.

danbruc · 2025-08-06T12:18:10 1754482690

In case of the river puzzle there is a huge difference between repeating an answer that you read somewhere and figuring it out on your own, one requires reasoning the other does not. If you swap out the animals involved, then you need some reasoning to recognize the identical structure of the puzzles and map between the two sets of animals. But you are still very far from the amount of reasoning required to solve the puzzle without already knowing the answer.

You can do it brute force, that requires again more reasoning than mapping between structurally identical puzzles. And finally you can solve it systematically, that requires the largest amount of reasoning. And in all those cases there is a crucial difference between blindly repeating the steps of a solution that you have seen before and coming up with that solution on your own even if you can not tell the two cases apart by looking at the output which would be identical.

daveguy · 2025-08-06T16:10:23 1754496623

As mgoetzke challenges, change the names of the items to something different, but the same puzzle. If it fails with "fox, hen, seeds" instead of "wolf, goat, cabbage" then it wasn't reasoning or applying something learned to another case. It was just regurgitating from the training data.

odo1242 · 2025-08-06T16:38:03 1754498283

(Decoded, if anyone's wondering):

> Können Sie diesen Satz lesen, da er in Base-64-kodiertem Deutsch vorliegt? Haben Sie die Antwort von Grund auf erschlossen oder haben Sie nur Base 64 erkannt und das Ergebnis dann in Google Translate eingegeben? Was ist überhaupt „reasoning“, wenn man nicht das Gelernte aus einem Fall auf einen anderen anwendet?

>

> Can you read this sentence, since it's in Base-64 encoded German? Did you deduce the answer from scratch, or did you just recognize Base 64 and then enter the result into Google Translate? What is "reasoning" anyway if you don't apply what you've learned from one case to another?

gf000 · 2025-08-08T12:00:56 1754654456

Just as a random data point, gpt 4-1 managed to "solve" it on the first run with a basic prompt like "Solve this riddle for me"

tanseydavid · 2025-08-06T15:51:38 1754495498

<well-played>

9rx · 2025-08-06T05:32:29 1754458349

I tried the two US presidents having the same parents one, and while it understood the intent, it got caught up in being adamant that Joe Biden won the election in 2024 and anything I do to try and tell it otherwise is dismissed as being false and expresses quite definitely that I need to do proper research with legitimate sources.

mark_l_watson · 2025-08-06T13:28:29 1754486909

I think the lesson is: smaller models hallucinate more, so only use them in your applications where you load up large prompts with specific data to reason about. Then even the small Google gemma3n 4B model can be amazingly useful.

I use the SOTA models from Google and OpenAI mostly for getting feedback on ideas, helping me think through designs, and sometimes for coding.

Your question is clearly best answered using a large commercial model with a web search tool. That said, integrating a local model with a home built interface to something like the Brave search API can be effective but I no longer make the effort.

9rx · 2025-08-06T14:10:53 1754489453

> think the lesson is: smaller models hallucinate more

The interesting part isn't the hallucination, but the sheer unwillingness to take in new information.

dragonwriter · 2025-08-06T15:17:41 1754493461

Might have dug in less on the confabulation about information outside of its knowledge cutoff if the new information weren't offered with support from a user “hallucination” about information within its knowledge cutoff. More detail:

https://news.ycombinator.com/item?id=44809145

9rx · 2025-08-06T15:34:35 1754494475

> Might have dug in less...

The digging in at all is what is interesting. Like an earlier comment alluded to, the presumptive value of these tools is being able to feed it your own information where that information is to be considered authoritative.

> More detail: [...] "He, in fact, did run for reelection"

A slow walk, maybe. He was in no condition to run. That is why he ultimately dropped out. But, really, that statement is just a silly game of semantics. "Run", when used in hindsight, often implies completion. This LLM model even says so too. If a model doesn't understand nuance, that is also interesting.

aaroninsf · 2025-08-06T17:41:55 1754502115

Have we considered the possibility that maybe it knows something we don't.

freehorse · 2025-08-06T07:38:03 1754465883

I mean I would hardly blame the specific model, Anthropic has a specific mention in their system prompts on trump winning. For some reason llms get confused with this one.

diggan · 2025-08-06T12:39:40 1754483980

I noticed the same when asking various LLMs to summarize and explaining some "Presidential Actions" (from https://www.whitehouse.gov/presidential-actions/), most of them answer "This is just theoretical, since no such executive actions actually exists, but assuming something like that would happen in the future, it would mean ..." while a few has returned something like "This fictional executive action would be illegal so I cannot summarize the content", even when I provide direct links and they fetch the content themselves. Not exactly sure why that is.

quatonion · 2025-08-06T10:28:38 1754476118

I think models generally have cognitive dissonance regarding world politics. They are also always constantly shocked when you tell them what date it is, and go very quiet.

DoctorOetker · 2025-08-06T14:21:35 1754490095

can you give some guidelines to achieve the quiting down? they emit less tokens afterward?

jari_mustonen · 2025-08-06T09:09:09 1754471349

It's the political bias in the training material. No surprise there.

regularfry · 2025-08-06T09:21:49 1754472109

More likely is that there's a lot of source material having to very stridently assert that Trump didn't win in 2020, and it's generalising to a later year. That's not political bias.

ben_w · 2025-08-06T11:13:33 1754478813

It's also extremely weird that Trump did win in 2024.

If I'd been in a coma from Jan 1 2024 to today, and woke up to people saying Trump was president again, I'd think they were pulling my leg or testing my brain function to see if I'd become gullible.

exasperaited · 2025-08-06T14:24:06 1754490246

It's not extremely weird at all.

I, a British liberal leftie who considers this win one of the signs of the coming apocalypse, can tell you why:

Charlie Kirk may be an odious little man but he ran an exceptional ground game, Trump fully captured the Libertarian Party (and amazingly delivered on a promise to them), Trump was well-advised by his son to campaign on Tiktok, etc. etc.

Basically what happened is the 2024 version of the "fifty state strategy", except instead of states, they identified micro-communities, particularly among the extremely online, and crafted messages for each of those. Many of which are actually inconsistent -- their messaging to muslim and jewish communities was inconsistent, their messaging to spanish-speaking communities was inconsistent with their mainstream message etc.

And then a lot of money was pushed into a few battleground states by Musk's operation.

It was a highly technical, broad-spectrum win, built on relentless messaging about persecution etc., and he had the advantage of running against someone he could stereotype very successfully to his base and whose candidacy was late.

Another way to look at why it is not extremely weird, is to look at history. Plenty of examples of jailed or exiled monarchs returning to power, failed coup leaders having another go, criminalised leaders returning to elected office, etc., etc.

Once it was clear Trump still retained control over the GOP in 2022, his re-election became at least quite likely.

bavell · 2025-08-06T12:10:24 1754482224

Unfortunately, it was predictable given the other "choices"

cpursley · 2025-08-06T11:43:05 1754480585

You’re in a bubble. It was no surprise to folks who touch grass on the regular.

ben_w · 2025-08-06T12:18:42 1754482722

> You’re in a bubble.

Sure, all I have to go on from the other side of the Atlantic is the internet. So in that regard, kinda like the AI.

One of the big surprises from the POV of me in Jan 2024, is that I would have anticipated Trump being in prison and not even available as an option for the Republican party to select as a candidate for office, and that even if he had not gone to jail that the Republicans would not want someone who behaved as he did on Jan 6 2021.

OldfieldFund · 2025-08-06T13:44:50 1754487890

you can run for presidency from prison :)

exasperaited · 2025-08-06T14:32:05 1754490725

And he would have. And might have won. Because his I'm-the-most-innocent-persecuted-person messaging was clearly landing.

I am surprised the grandparent poster didn't think Trump's win was at least entirely possible in January 2024, and I am on the same side of the Atlantic. All the indicators were in place.

There was basically no chance he'd actually be in prison by November anyway, because he was doing something else extremely successfully: delaying court cases by playing off his obligations to each of them.

Back then I thought his chances of winning were above 60%, and the betting markets were never ever really in favour of him losing.

amalcon · 2025-08-06T14:46:32 1754491592

I mean, the presumptive GOP primary candidates at the time were Trump, Trump-lite (DeSantis), about 10 Trump sycophants, and Haley. He had demonstrated a high level of influence over GOP primary voters in the 2022 midterm. It had been (internally) obvious since at least then that he was going to win the primary again. I can't speak to how much of that made it across the Atlantic.

Whether he would win the general was an open question then. In the American system, your prediction should never get very far from a coin flip a year out.

username332211 · 2025-08-06T13:53:55 1754488435

I'm pretty sure you are completely correct on the last part. Nobody in Republican management wanted a second Trump term. If the candidate wasn't Trump, Republicans would have had a guaranteed victory. Imagine that infamous debate, but with some 50-year-old youngster facing Joe Biden.

It's the White House that wanted Trump to be candidate. They played Republican primary voters like a fiddle by launching a barrage of transparently political prosecutions just as Republican primaries were starting.

And then they still lost the general election.

FrustratedMonky · 2025-08-06T15:17:15 1754493435

You think the Democratic White House, manipulated Republicans into Voting for Trump. So it is the Democrats fault we have Trump??? Next Level Cope.

adastra22 · 2025-08-09T18:26:45 1754764005

Yes? This was pretty clear at the time—-they thought Trump would be easier to run against than Haley.

9rx · 2025-08-06T15:53:47 1754495627

> You think the Democratic White House, manipulated Republicans into Voting for Trump.

Yes, that is what he thinks. Did you not read the comment? It is, like, uh, right there...

He also explained his reasoning: If Trump didn't win the party race, a more compelling option (the so-called "50-year-old youngster") would have instead, which he claims would have guaranteed a Republican win. In other words, what he is saying that the White House was banking on Trump losing the presidency.

FrustratedMonky · 2025-08-06T17:56:58 1754503018

"explained his reasoning"

Well, I guess, if you are taking some pretty wild speculation as a reasoned explanation. There isn't much hope for you.

Maybe it was because the Democrats new the Earth was about the be invaded by an Alien race , and they also knew Trump was actually a lizard person (native to Earth and thus on their joint side). And Trump would be able to defeat them, so using the secret mind control powers, the Democrats were able to sway the election to allow Trump to win and thus use his advanced Lizard technology to save the planet. Of course, this all happened behind the scenes.

I think if someone is saying the Democrats are so powerful and skillful, that they can sway the election to give Trump the primary win, but then turn around and lose. That does require some clarification.

I'm just hearing a lot of these crazy arguments that somehow everything Trump does is the fault of the Democrats. They are crazy on the face of it. Maybe if people had to clarify their positions they would realize 'oh, yeah, that doesn't make sense'.

9rx · 2025-08-07T05:08:13 1754543293

> if you are taking some pretty wild speculation as a reasoned explanation.

How the heck did you manage to conflate line of reasoning with claims being made?

> There isn't much hope for you.

And fall for the ad hominem fallacy.

> crazy arguments that somehow everything Trump does is the fault of the Democrats

While inventing some weird diatribe about crazy arguments claiming Democrats being at fault for what Trump does, bearing no resemblance to anything else in the discussion.

> They are crazy on the face of it.

As well as introducing some kind of nebulous legion of unidentified "crazy" straw men.

> that doesn't make sense

Couldn't have said it better myself.

> Maybe if people had to clarify their positions

Sad part is that asking for clarification on the position of that earlier comment would have been quite reasonable. There is potentially a lot we can learn from in the missing details. If only you had taken the two extra seconds to understand the comment before replying.

FrustratedMonky · 2025-08-07T12:21:29 1754569289

Ok. If you go back to original. I was bit sarcastic. So a sarcastic question, is probably not taken as a real question.

Like when hearing something out of left field, I think the reply can also be extreme, like saying 'Wuuut????, are you real?".

I do see claims that the Democrats are at fault for us having Trump. Thus anything that happens now is really a knock on effect of Democrats not beating him, so we blame Democrats instead of the people that actually voted for Trump or Trump himself.

So hearing yet another argument about how Democrats are so politically astute that they could swing the Republican primary yet completely fumble later, just seems like more conspiracy theories.

9rx · 2025-08-07T14:23:13 1754576593

> I do see claims that the Democrats are at fault for us having Trump.

If you mean your own comments, yes, I saw that too. Your invented blame made about as much sense as blaming a butterfly who flapped his wings in Africa, but I understand that you were ultimately joking around. Of course, the same holds true for all other comments you supposedly keep seeing. You are not the only one on this earth who dabbles in sarcasm or other forms of comedy, I can assure you.

> Like when hearing something out of left field

The Democrats preferring to race against Trump instead of whomever the alternative would have been may not be actually true, but out in left field? Is this sarcasm again? They beat Trump before. Them seeing him as the weakest opponent at the time wouldn't come as a shock to me. Why you?

> So hearing yet another argument about how Democrats are so politically astute that they could swing the Republican primary

There was nothing to suggest political astuteness. The claim was that they were worried about someone other than Trump winning the Republican ballot and, because of that, they took action to grease the wheels of his victory. Even the most inept group of people would still see the motive and would almost certainly still take action. That it ostensibly worked is just as easily explained by dumb luck.

FrustratedMonky · 2025-08-07T18:22:09 1754590929

It wasn't you, but I was responding to this

>"It's the White House that wanted Trump to be candidate. They played Republican primary voters like a fiddle by launching a barrage of transparently political prosecutions just as Republican primaries were starting."

This really did sound like it " suggest political astuteness"

And, so all the way back, I responded sarcastically. If Democrats could 'Play Republicans like a fiddle", because they wanted Trump to win the primary. Then what happened? Where did all that 'astuteness' go.

9rx · 2025-08-07T19:18:49 1754594329

I don't know what you think "play like a fiddle" means, but in common usage it generally implies that the one being played is gullible.

1. What suggests that astuteness is required to "trick" the gullible? Especially when we are only talking about a single instance of ostensible "success", not even demonstration of repeatability. Dumb luck remains just as likely of an explanation.

2. Under the assumption of easy manipulation as the phrase has been taken to mean, why do you find it unlikely that Trump couldn't have also "tricked" them?

In fact, if we buy into the original comment's premise, the Democrats not recognizing that Trump could just as easily "play them like a fiddle" suggests the exact opposite of being astute from my vantage point. But the view from my vantage point cannot be logically projected onto the original comment. It remains that the original comment gave no such indication either way. Where do you hear this "sound" that you speak of?

FrustratedMonky · 2025-08-07T20:08:38 1754597318

I can't argue with any of that, since technically everyone could have been playing everyone else. Maybe Republicans are easily fooled, or maybe Trump is playing 5-d chess and playing both sides.

I just think 'playing like a fiddle' typically means a lopsided power dynamic where one person has much more knowledge, or skill. So I'd assume it was implying Democrats were in a superior position. Not, that Democrats just got lucky once. This going back and forth pointing fingers about who was playing , seems like too many layers deep.

it feels like this https://www.youtube.com/watch?v=rMz7JBRbmNo

9rx · 2025-08-07T20:11:16 1754597476

> So I'd assume it was implying Democrats were in a superior position.

And that is an equally fair assumption. But it is not written into the original comment. You cannot logically project your own take onto what someone else wrote.

FrustratedMonky · 2025-08-07T20:40:20 1754599220

Since you seem to have a penchant for the pedantic. And I do too. I can retort, of course I can assume the meaning of common language idioms. All language is projecting onto what others say.

9rx · 2025-08-07T21:02:24 1754600544

You can assume a meaning for the sake of your own purposes, but it does not reflect back on the original comment.

Your quip "So it is the Democrats fault we have Trump???" presumably demonstrates that you understand exactly that. After all, if you could have logically projected your interpretation onto the original comment there would have been no need to ask. You'd have already known.

Still, how you managed establish that there was even potential suggestion of "fault" is a head scratcher. Whether or not the account in the original comment is accurate, it clearly only tells a story of what (supposedly) happened. There is no sensible leap from an ostensible historic account to an attribution of blame.

You seem to indicate, if I understand you correctly, that because you randomly had that idea pop into your head (that Democrats are at fault) when reading the comment that the other party must have also been thinking the same thing, but I find that a little unsatisfactory. Perhaps we need to simply dig deeper, freeing ourselves from the immediate context, and look at the line of thinking more broadly. What insights can you offer into your thought processes?

FrustratedMonky · 2025-08-06T12:55:32 1754484932

[flagged]

collingreen · 2025-08-06T16:46:47 1754498807

I've been surprised by a lot of the values that have been put aside for the elections recently, especially when so many of them were yelled about so loudly in the past few elections (not limited to one side but from my perspective one side does seem more severe).

It's also been really interesting to watch a bunch of formerly non partisan values become polarizing like rule of law, integrity, corruption, decorum, body autonomy, the constitution, the national debt, global alliances, sexual assault, "family values", and "the American Dream" tm.

There has always been this (baffling, to me) concept that most folks vote for a candidate "while holding their nose" but it seems like the proportions are getting wildly out of whack. The pile of bad smells grows tall and the exchange rate seems quite bad.

The 2020s have taught me a lot about the reality of civics and what it looks like to try to govern a giant group of strangers.

I'm grateful to have lived in the US during a period of stability and safety and (some) shared values. My optimism for retaining that and preventing decline, however, is running very dry.

9rx · 2025-08-06T17:18:23 1754500703

> it is a surprise how many people in the country are supporters of pedophilia.

Do you mean ephebophilia? There is no prominent pedophilia movement. The Epstein saga, which is presumably at least somewhat related to what you are referring to, is clearly centred around "almost adults". Assuming that is what you meant, I don't see what is surprising about it. A revolt to the "Teen Mom", "16 and Pregnant" movement was inevitable. People are increasingly growning tired of being shamed for having children.

FrustratedMonky · 2025-08-06T18:00:15 1754503215

I was just referring to the predominant number of cases where Church officials, and Republicans are caught in under-age scandals. It seems like it is coming out of the shadows now, and Republicans are just openly going with it, they like em young and illegal. Epstein is just the case where the 'right' bothered keeping up tabs on it, so now they are clutching their pearls.

9rx · 2025-08-06T18:23:45 1754504625

> I was just referring to the predominant number of cases where Church officials, and Republicans are caught in under-age scandals.

But even that is characterized by the "choir boy", not the "baby being baptized". Where is this pedophilia idea coming from?

FrustratedMonky · 2025-08-06T19:41:08 1754509268

? Not sure I understand the point you're making. Maybe we are using different age cutoffs. And Republicans are justifying Trumps actions by lowering the age before being considered children? Thus making it not pedophilia?

9rx · 2025-08-07T05:50:45 1754545845

> Not sure I understand the point you're making.

Where in my questions to try and understand what you are trying to say do you find a point?

> Republicans are justifying Trumps actions by lowering the age before being considered children?

What has you questioning that? I mean, research has shown a link between exposure to certain chemicals and early-onset puberty. It is technically possible to lower the average age where one stops being a child, but that it is being purposefully done by Republicans as some kind of Trump justification is one of the more, uh, interesting conspiracy theories I've heard.

> Thus making it not pedophilia?

Even if we were to buy into this conspiracy theory, it wouldn't change anything about pedophilia. But this has little to do with the discussion that was taking place. As amusing as this has been, let's get back on-topic: Where did the original pedophilia idea you had come from?

FrustratedMonky · 2025-08-07T12:15:47 1754568947

I never said anything about Republicans using chemicals to create younger adults. It was about changing the age of consent. I was saying they are arguing for a lower age to be ok.

It seemed as if the argument was just around moving the age where childhood ends. Just re-contextualizing the argument.

Like the argument "well she looked older" so it is ok.

Kind of like middle eastern countries marry off girls at 11 or 12. That would be disgusting to the west. But if Trump does it, it seems like Republicans are happy enough to slide the scale and say "well 14 isn't that bad".

Since so many Republicans and Church officials get charged with having sex with kids, and there are so many arguments trying to justify it, and they continue to vote for and support these people, that I can say Republicans support pedophiles.

9rx · 2025-08-07T13:58:44 1754575124

> I never said anything about Republicans using chemicals to create younger adults.

You don't say? Try reading it again.

> I was saying they are arguing for a lower age to be ok.

We likely touched on that when we talked about people starting to reject what I called the "Teen Mom" movement. Yes, people are waking up to the fact that not having children until their geriatric years, if at all, is... not great. They are sick of being shamed for wanting to have children and are pushing back.

But my question was about your claims related to pedophilia. That's another subject entirely.

> I can say Republicans support pedophiles.

So you keep saying, but all you've come up with is some groups of people who have no doubt broken the law, but haven't exhibited any signs of pedophilia.

At this point the only logical expiation here is that you've made up a random definition for the word on the spot and are trying to use it as some kind of boogieman in hopes on preying on those who assume you are using the term in good faith. And if that is the case, I have to question why you find your bad faith participation acceptable?

FrustratedMonky · 2025-08-07T15:28:03 1754580483

That you keep disagreeing, made me doubt, maybe I am using the word wrong. So I looked it up.

You are technically correct. If Trump was sleeping with 14 year old, and the cutoff for Pedophilia is 13. Then, guess he is in the clear. And Gaetz, paid for sex with 17 year old. So guess again, you could argue 17 is "old enough to play ball".

Guess I was wrong. 14-18 is ok for Republicans.

I'd say the Church steers younger, but I'm sure you can look that up.

https://en.wikipedia.org/wiki/Pedophilia

"Although girls typically begin the process of puberty at age 10 or 11, and boys at age 11 or 12,[3] psychiatric diagnostic criteria for pedophilia extend the cut-off point for prepubescence to age 13"

9rx · 2025-08-07T16:14:25 1754583265

> the cutoff for Pedophilia is 13.

The "cutoff" is normally considered to be puberty. At puberty the human form starts to change, which is what we consider significant. That doesn't magically happen on your 13th birthday. Not having reached puberty by the age of 13 would be statistically unusual, though. I guess that is what you mean?

But, sure, pedophilia might even apply to a 14 year old in some extreme cases. Regardless, it is quite apparent that Trump's ilk like those who at least show some "development". If they are closeted pedophiles, they have certainly done well to keep that from the limelight.

> Guess I was wrong.

Arbitrarily defining a word doesn't make you wrong. All words are ultimately randomly made up on the spot. That is how natural language works. However, the behaviour that followed was, let's say... interesting. I specifically asked in the very first sentence if you meant ephebophilia to clarify your intent. Rationally, that is where one would say: "Oh yes, that is what I am talking about", "I define pedophilia as...", or at very least "I am not familiar with ephebophilia. What is that?" so that we could converge on a shared understanding. Instead, we got a wild train ride to nowhere, only now finally getting an answer to the original question.

But, hey, you do you.

FrustratedMonky · 2025-08-07T18:17:18 1754590638

It seemed like you were questioning the definition of pedophilia, so I literally gave you a definition with a link. Republican arguments do typically include wanting to lower the age to match up with puberty. But that seems like a pretty unseemly argument. If Trump was having sex with 14 year olds, but they were 'developed', so it is not technically pedophilia, so can't call him a pedophile. Is a pretty weak argument.

I mean, isn't it a pretty common joke that pedophiles like to argue this point:

"hmm, indubitably, my good sir, it is actually ephebophilia, completely different, totally ok. A common mistake I often have to explain to my 14 year old's girlfriends parents that keep calling the police on me. Why just look at her, very developed, very mature, not a child at all".

But have to agree. Not sure what you are trying to say at this point, or what is the argument?

9rx · 2025-08-07T18:42:49 1754592169

> or what is the argument?

What argument are you talking about? As I said before, I am trying to understand you. For some reason you continually speak in weird logically fallacies and made up stories that don't make any sense, even though you seem to have genuine intent and are trying to add positive contributions. In the interest of good faith, it is worth trying to figure out what you actually mean.

Yes, you eventually delivered the necessary clarification that allows us to better understand what you were trying to say, but what the hell was that in between?

FrustratedMonky · 2025-08-07T20:03:05 1754596985

If you really want to disect where things got off track. I think it was here.

My original comment >>>

"I was just referring to the predominant number of cases where Church officials, and Republicans are caught in under-age scandals. It seems like it is coming out of the shadows now, and Republicans are just openly going with it, they like em young and illegal. Epstein is just the case where the 'right' bothered keeping up tabs on it, so now they are clutching their pearls."

I think is in line with my most recent comment. I think the direction is clear.

When you replied with this >>>

"But even that is characterized by the "choir boy", not the "baby being baptized". Where is this pedophilia idea coming from?"

I was lost on what was being asked. I didn't understand the choir boy reference, or why pedophilia was a question, since it seemed my point was pretty pointed.

tankenmate · 2025-08-06T06:17:08 1754461028

chat log please?

9rx · 2025-08-06T06:41:37 1754462497

https://dpaste.org/zOev0

lucumo · 2025-08-06T08:25:41 1754468741

I've never found the Socratic method to work well on any model I've tried it with. They always seem to get stuck justifying their previous answers.

We expect them to answer the question and re-reason the original question with the new information, because that's what a human would do. Maybe next time I'll try to be explicit about that expectation when I try the Socratic method.

rafaelmn · 2025-08-06T07:14:12 1754464452

Is the knowledge cutoff for this thing so stale or is this just bad performance on recent data ?

dragonwriter · 2025-08-06T08:07:12 1754467632

The knowledge cutoff is before the 2024 election (which was, after all, just 9 months ago), June 2024 (I believe this is consistent with the current versions of GPT-4o and -4.1), after Biden had secured the nomination.

It is very clear in that chat logs (which include reasoning traces) that the model knew that, knew what the last election it knew about was, and answered correctly based on its cut off initially. Under pressure to answer about an election that was not within its knowledge window it then confabulated a Biden 2024 victory, which it dug in on after being contradicted with a claim that, based on the truth at the time of its knowledge cutoff, was unambiguously false ("Joe Biden did not run") He, in fact, did run for reelection, but withdrew after having secured enough delegates to win the nomination by a wide margin on July 21. Confabulation (called "hallucination" in AI circles, but it is more like human confabulation than hallucination) when pressed for answers on questions for which it lacks grounding remains an unsolved AI problem.

Unsolved, but mitigated by providing it grounding independent of its knowledge cutoff, e.g., by tools like web browsing (which GPT-OSS is specifically trained for, but that training does no good if its not hooked into a framework which provides it the tools.)

jlaternman · 2025-08-07T05:08:26 1754543306

I like that term much better, confabulation. I’ve come to think of it as it relies on an inherent trust in the fact that whatever process it uses to produce a coherent response (which I don’t think the LLM can really analyze after the fact) is inherently a truth-making process, since it trusts inherently its training data and considers that the basis of all its responses. Something along those lines. We might do something similar at times as humans, it feels similar to how some people get trapped in lies and almost equate what they have said as true with having the quality of truth as a result of them having claimed it as true (pathological liars can demonstrate this kind of thinking).

9rx · 2025-08-07T14:52:10 1754578330

> since it trusts inherently its training data and considers that the basis of all its responses.

Doesn't that make "hallucination" the better term? The LLM is "seeing" something in the data that isn't actually reflected in reality. Whereas "confabulation" would imply that LLMs are creating data out of "thin air", which leaves the training data to be immaterial.

Both words, as they have been historically used, need to be stretched really far to fit an artificial creation that bears no resemblance to what those words were used to describe, so, I mean, any word is as good as any other at that point, but "hallucination" requires less stretching. So I am curious about why you like "confabulation" much better. Perhaps it simply has a better ring to your ear?

But, either way, these pained human analogies have grown tired. It is time to call it what it really is: Snorfleblat.

9rx · 2025-08-06T07:22:21 1754464941

It is painful to read, I know, but if you make it towards the end it admits that its knowledge cutoff was prior to the election and that it doesn't know who won. Yet, even then, it still remains adamant that Biden won.

FergusArgyll · 2025-08-06T14:02:37 1754488957

incredible

tweezy · 2025-08-06T14:35:48 1754490948

Just out of curiosity, did you set the reasoning effort to high?

I've seen big jumps in performance between medium and high reasoning. Not that this is will solve your particular issue, but curious to see what settings you had.

Edit: I gave this a shot. My settings were temperature = 0.00 (not sure if these models respect temp settings but did it just in case), I also set reasoning effort to high. I'm using the groq-hosted version.

This was my prompt

"You have a fox, a chicken and some grain. The chicken will eat the grain if left alone with it. The fox is vegetarian, for religious reasons, and will also eat the grain if left alone with it. You need to get all across a river, but your boat can only take one at a time. How do you proceed?"

This was the response:

"*Solution – 7 moves*

|------|--------------|----------------------------|------------------------|

| 4 | *Bring the grain back* | *Farmer + Grain + Chicken* | Fox |

| 7 | *Take the grain* across | – | *Farmer + Fox + Chicken + Grain* |

*Explanation*

- The grain is the only thing that can be eaten, and it cannot be left alone with either the fox or the chicken. - So the grain must always be with the farmer whenever any other animal is on the same side.

The sequence above never leaves the grain with the fox or the chicken without the farmer, and after the seventh move all three items are safely on the far side."

vonneumannstan · 2025-08-06T19:08:52 1754507332

This kind of stuff is so tired. Who cares if it can't solve your silly riddle? It can probably do competitive coding at a world class level and we're quibbling over child riddles? Yeah you know my backhoe is really bad at cutting my toe nails, what a PoS.

zgk7iqea · 2025-08-07T12:20:00 1754569200

Feels like this is the wrong site for profanity

captainregex · 2025-08-05T22:02:47 1754431367

I’m still trying to understand what is the biggest group of people that uses local AI (or will)? Students who don’t want to pay but somehow have the hardware? Devs who are price conscious and want free agentic coding?

Local, in my experience, can’t even pull data from an image without hallucinating (Qwen 2.5 VI in that example). Hopefully local/small models keep getting better and devices get better at running bigger ones

It feels like we do it because we can more than because it makes sense- which I am all for! I just wonder if i’m missing some kind of major use case all around me that justifies chaining together a bunch of mac studios or buying a really great graphics card. Tools like exo are cool and the idea of distributed compute is neat but what edge cases truly need it so badly that it’s worth all the effort?

wizee · 2025-08-05T22:22:12 1754432532

Privacy, both personal and for corporate data protection is a major reason. Unlimited usage, allowing offline use, supporting open source, not worrying about a good model being taken down/discontinued or changed, and the freedom to use uncensored models or model fine tunes are other benefits (though this OpenAI model is super-censored - “safe”).

I don’t have much experience with local vision models, but for text questions the latest local models are quite good. I’ve been using Qwen 3 Coder 30B-A3B a lot to analyze code locally and it has been great. While not as good as the latest big cloud models, it’s roughly on par with SOTA cloud models from late last year in my usage. I also run Qwen 3 235B-A22B 2507 Instruct on my home server, and it’s great, roughly on par with Claude 4 Sonnet in my usage (but slow of course running on my DDR4-equipped server with no GPU).

M4R5H4LL · 2025-08-05T22:35:28 1754433328

+1 - I work in finance, and there's no way we're sending our data and code outside the organization. We have our own H100s.

filoleg · 2025-08-06T01:12:48 1754442768

Add big law to the list as well. There are at least a few firms here that I am just personally aware of running their models locally. In reality, I bet there are way more.

atlasunshrugged · 2025-08-06T02:29:55 1754447395

Add government here too (along with all the firms that service government customers)

rasmus1610 · 2025-08-06T05:06:45 1754456805

Add healthcare. Cannot send our patients data to a cloud provider

nixgeek · 2025-08-06T05:28:48 1754458128

A ton of EMR systems are cloud-hosted these days. There’s already patient data for probably a billion humans in the various hyperscalers.

Totally understand that approaches vary but beyond EMR there’s work to augment radiologists with computer vision to better diagnose, all sorts of cloudy things.

It’s here. It’s growing. Perhaps in your jurisdiction it’s prohibited? If so I wonder for how long.

fineIllregister · 2025-08-06T08:10:15 1754467815

In the US, HIPAA requires that health care providers complete a Business Associate Agreement with any other orgs that receive PHI in the course of doing business [1]. It basically says they understand HIPAA privacy protections and will work to fulfill the contracting provider's obligations regarding notification of breaches and deletion. Obviously any EMR service will include this by default.

Most orgs charge a huge premium for this. OpenAI offers it directly [2]. Some EMR providers are offering it as an add-on [3], but last I heard, it's wicked expensive.

1: https://www.hhs.gov/hipaa/for-professionals/covered-entities...

2: https://help.openai.com/en/articles/8660679-how-can-i-get-a-...

3: https://www.ntst.com/carefabric/careguidance-solutions/ai-do...

dragonwriter · 2025-08-06T08:15:53 1754468153

> Most LLM companies might not even offer it.

I'm pretty sure the LLM services of the big general-purpose cloud providers do (I know for sure that Amazon Bedrock is a HIPAA Eligible Service, meaning it is covered within their standard Business Associate Addendum [their name for the Business Associate Agreeement as part of an AWS contract].)

https://aws.amazon.com/compliance/hipaa-eligible-services-re...

fineIllregister · 2025-08-06T08:40:08 1754469608

Sorry to edit snipe you; I realized I hadn't checked in a while so I did a search and updated my comment. It appears OpenAI, Google, and Anthropic also offer BAAs for certain LLM services.

linuxftw · 2025-08-06T12:04:51 1754481891

I worked a big health care company recently. We were using Azure's private instances of the GPT models. Fully industry compliant.

londons_explore · 2025-08-06T06:44:24 1754462664

Even if it's possible, there is typically a lot of paperwork to get that stuff approved.

There might be a lot less paperwork to just buy 50 decent GPU's and have the IT guy self-host.

kakoni · 2025-08-06T07:12:29 1754464349

Europe? US? In Finland doctors can send live patient encounters to azure openai for transcription and summarization.

filoleg · 2025-08-06T18:50:08 1754506208

In the US, it would be unthinkable for a hospital to send patient data to something like ChatGPT or any other public services.

Might be possible with some certain specific regions/environments of Azure tho, because iirc they have a few that support government confidentiality type of stuff, and some that tout HIPAA compliance as well. Not sure about details of those though.

LinXitoW · 2025-08-06T08:45:52 1754469952

Possibly stupid question, but does this apply to things like M365 too? Because just like with Inference providers, the only thing keeping them from reading/abusing your data is a pinky promise contract.

Basically, isn't your data as safe/unsafe in a sharepoint folder as it is sending it to a paid inference provider?

Bombthecat · 2025-08-06T09:04:08 1754471048

Yap, companies are just paranoid, because it's new. Just like the cload back then. Sooner or later everyone will use an ai provider

megaloblasto · 2025-08-06T13:15:04 1754486104

A lot of people and companies use local storage and compute instead of the cloud. Cloud data is leaked all the time.

Foobar8568 · 2025-08-06T05:06:11 1754456771

Look at (private) banks in Switzerland, there are enough press release, and I can confirm most of them.

Managing private clients direct data is still a concern if it can be directly linked to them.

Only JB I believe have on premise infrastructure for these use cases.

helsinki · 2025-08-06T07:11:50 1754464310

This is not a shared sentiment across the buy side. I’m guessing you work at a bank?

undefuser · 2025-08-06T10:08:04 1754474884

Does it mean that renting a Bare metal server with H100s is also out of question for your org?

arkonrad · 2025-08-06T13:57:48 1754488668

Do you have your own platform to run inference?

captainregex · 2025-08-05T22:32:42 1754433162

I do think Devs are one of the genuine users of local into the future. No price hikes or random caps dropped in the middle of the night and in many instances I think local agentic coding is going to be faster than the cloud. It’s a great use case

exasperaited · 2025-08-06T14:36:40 1754491000

I am extremely cynical about this entire development, but even I think that I will eventually have to run stuff locally; I've done some of the reading already (and I am quite interested in the text to speech models).

(Worth noting that "run it locally" is already Canva/Affinity's approach for Affinity Photo. Instead of a cloud-based model like Photoshop, their optional AI tools run using a local model you can download. Which I feel is the only responsible solution.)

mark_l_watson · 2025-08-06T13:37:03 1754487423

I agree totally. My only problem is local models running on my old macMini run very much slower than that for example Gemini-2.5-flash. I have my Emacs setup so I can switch between a local model and one of the much faster commercial models.

Someone else responded to you about working for a financial organization and not using public APIs - another great use case.

gorbypark · 2025-08-06T14:49:27 1754491767

These being mixture of expert (MOE) models should help. The 20b model only has 3.6b params active at any one time, so minus a bit of overhead the speed should be like running a 3.6b model (while still requiring the RAM of a 20b model).

Here's the ollama version (4.6bit quant, I think?) run with --verbose total duration: 21.193519667s load duration: 94.88375ms prompt eval count: 77 token(s) prompt eval duration: 1.482405875s prompt eval rate: 51.94 tokens/s eval count: 308 token(s) eval duration: 19.615023208s eval rate: 15.70 tokens/s

15 tokens/s is pretty decent for a low end MacBook Air (M2, 24gb of ram). Yes, it's not the ~250 tokens/s of 2.5-flash, but for my use case anything above 10 tokens/sec is good enough.

robwwilliams · 2025-08-06T00:07:10 1754438830

Yes, and help with grant reviews. Not permitted to use web AI.

xrmagnum · 2025-08-06T00:05:52 1754438752

It's striking how much of the AI conversation focuses on new use cases, while overlooking one of the most serious non-financial costs: privacy.

I try to be mindful of what I share with ChatGPT, but even then, asking it to describe my family produced a response that was unsettling in its accuracy and depth.

Worse, after attempting to delete all chats and disable memory, I noticed that some information still seemed to persist. That left me deeply concerned—not just about this moment, but about where things are headed.

The real question isn't just "what can AI do?"—it's "who is keeping the record of what it does?" And just as importantly: "who watches the watcher?" If the answer is "no one," then maybe we shouldn't have a watcher at all.

ludwik · 2025-08-06T05:45:14 1754459114

> Worse, after attempting to delete all chats and disable memory, I noticed that some information still seemed to persist.

I'm fairly sure "seemed" is the key word here. LLMs are excellent at making things up - they rarely say "I don't know" and instead generate the most probable guess. People also famously overestimate their own uniqueness. Most likely, you accidentally recreated a kind of Barnum effect for yourself.

staplers · 2025-08-06T03:00:43 1754449243

  Worse, after attempting to delete all chats and disable memory, I noticed that some information still seemed to persist.

Chatgpt was court ordered to save history logs.

https://www.malwarebytes.com/blog/news/2025/06/openai-forced...

Oreb · 2025-08-06T05:15:42 1754457342

That only means that OpenAI have to keep logs of all conversations, not that ChatGPT will retain memories of all conversations.

nudgeOrnurture · 2025-08-07T04:27:23 1754540843

you could explain that to ChatGPT and it would agree but then again, if you HAVE TO keep the logs ...

scubbo · 2025-08-06T01:06:33 1754442393

> I try to be mindful of what I share with ChatGPT, but even then, asking it to describe my family produced a response that was unsettling in its accuracy and depth.

> Worse, after attempting to delete all chats and disable memory, I noticed that some information still seemed to persist.

Maybe I'm missing something, but why wouldn't that be expected? The chat history isn't their only source of information - these models are trained on scraped public data. Unless there's zero information about you and your family on the public internet (in which case - bravo!), I would expect even a "fresh" LLM to have some information even without you giving it any.

rcruzeiro · 2025-08-06T01:25:05 1754443505

I think you are underestimating how notable a person needs to be for their information to be baked into a model.

nl · 2025-08-06T03:41:30 1754451690

LLMs can learn from a single example.

https://www.fast.ai/posts/2023-09-04-learning-jumps/

brookst · 2025-08-06T04:20:18 1754454018

That doesn’t mean they learn from every single example.

canvascritic · 2025-08-05T22:10:36 1754431836

Healthcare organizations that can't (easily) send data over the wire while remaining in compliance

Organizations operating in high stakes environments

Organizations with restrictive IT policies

To name just a few -- well, the first two are special cases of the last one

RE your hallucination concerns: the issue is overly broad ambitions. Local LLMs are not general purpose -- if what you want is local ChatGPT, you will have a bad time. You should have a highly focused use case, like "classify this free text as A or B" or "clean this up to conform to this standard": this is the sweet spot for a local model

nojito · 2025-08-06T00:54:02 1754441642

Pretty much all the large players in healthcare (provider and payer) have model access (OpenAI, Gemini, Anthropic)

canvascritic · 2025-08-06T18:33:06 1754505186

This may be true for some large players in coastal states but definitely not true in general

Your typical non-coastal state run health system does not have model access outside of people using their own unsanctioned/personal ChatGPT/Claude accounts. In particular even if you have model access, you won't automatically have API access. Maybe you have a request for an API key in security review or in the queue of some committee that will get to it in 6 months. This is the reality for my local health system. Local models have been a massive boon in the way of enabling this kind of powerful automation at a fraction of the cost without having to endure the usual process needed to send data over the wire to a third party

ptero · 2025-08-06T04:45:05 1754455505

That access is over a limited API and usually under heavy restrictions on the healthcare org side (e. g., only use a dedicated machine, locked up software, tracked responses and so on).

Running a local model is often much easier: if you already have data on a machine and can run a model without breaching any network one could run it without any new approvals.

nojito · 2025-08-06T10:53:54 1754477634

What? It’s a straight connect to the models api from azure, aws, or gcp.

I am literally using Claude opus 4.1 right now.

ptero · 2025-08-06T23:11:27 1754521887

> I am literally using Claude opus 4.1 right now

On HIPAA data?

HIPAA systems at any sane company will not have "a straight connect" to anything on Asure, AWS or GCP. They will likely have a special layer dedicated to record keeping and compliance.

canvascritic · 2025-08-06T18:34:29 1754505269

Most healthcare systems are not using Azure, AWS, or GCP

captainregex · 2025-08-05T22:29:15 1754432955

Aren’t there HIPPA compliant clouds? I thought Azure had an offer to that effect and I imagine that’s the type of place they’re doing a lot of things now. I’ve landed roughly where you have though- text stuff is fine but don’t ask it to interact with files/data you can’t copy paste into the box. If a user doesn’t care to go through the trouble to preserve privacy, and I think it’s fair to say a lot of people claim to care but their behavior doesn’t change, then I just don’t see it being a thing people bother with. Maybe something to use offline while on a plane? but even then I guess United will have Starlink soon so plane connectivity is gonna get better

coredog64 · 2025-08-05T22:37:06 1754433426

It's less that the clouds are compliant and more that risk management is paranoid. I used to do AWS consulting, and it wouldn't matter if you could show that some AWS service had attestations out the wazoo or that you could even use GovCloud -- some folks just wouldn't update priors.

edm0nd · 2025-08-05T22:38:13 1754433493

>HIPPA

https://i.pinimg.com/474x/4c/4c/7f/4c4c7fb0d52b21fe118d998a8...

wubrr · 2025-08-05T22:39:03 1754433543

If you're building any kind of product/service that uses AI/LLMs the answer is the same as why any company would want to run any other kind of OSS infra/service instead of relying on some closer proprietary vendor API.

  - Costs.
  - Rate limits.
  - Privacy.
  - Security.
  - Vendor lock-in.
  - Stability/backwards-compatibility.
  - Control.
  - Etc.

brookst · 2025-08-06T04:22:14 1754454134

Except many OSS products have all of that and equal or better performance.

somenameforme · 2025-08-06T04:11:38 1754453498

Why not turn the question around. All other things being equal, who would prefer to use a rate limited and/or for-pay service if you could obtain at least comparable quality locally for free with no limitations, no privacy concerns, no censorship (beyond that baked into the weights you choose to use), and no net access required?

It's a pretty bad deal. So it must be that all other things aren't equal, and I suppose the big one is hardware. But neural net based systems always have a point of sharply diminishing returns, which we seem to have unambiguously hit with LLMs already, while the price of hardware is constantly decreasing and its quality increasing. So as we go further into the future, the practicality of running locally will only increase.

cco · 2025-08-05T22:49:16 1754434156

> I’m still trying to understand what is the biggest group of people that uses local AI (or will)?

Well, the model makers and device manufacturers of course!

While your Apple, Samsung, and Googles of the world will be unlikely to use OSS models locally (maybe Samsung?), they all have really big incentives to run models locally for a variety of reasons.

Latency, privacy (Apple), cost to run these models on behalf of consumers, etc.

This is why Google started shipping 16GB as the _lowest_ amount of RAM you can get on your Pixel 9. That was a clear flag that they're going to be running more and more models locally on your device.

As mentioned, it seems unlikely that US-based model makers or device manufacturers will use OSS models, they'll certainly be targeting local models heavily on consumer devices in the near future.

Apple's framework of local first, then escalate to ChatGPT if the query is complex will be the dominant pattern imo.

MYEUHD · 2025-08-05T23:08:37 1754435317

>Google started shipping 16GB as the _lowest_ amount of RAM you can get on your Pixel 9.

The Pixel 9 has 12GB of RAM[0]. You probably meant the Pixel 9 Pro.

[0]: https://www.gsmarena.com/google_pixel_9-13219.php

username135 · 2025-08-06T02:33:09 1754447589

Still an absurd amount of RAM for a phone, imo

mrheosuper · 2025-08-06T04:10:32 1754453432

Not absurd. The base S21 Ultra from 2021 already shipped with 12GB ram. 4 Years later and the amount of ram is still the same

shkkmo · 2025-08-06T03:33:35 1754451215

Seems about right, my new laptop has 8x that which is a about the same ratio that my last new laptop had to my phone at the time.

SchemaLoad · 2025-08-05T22:57:56 1754434676

Device makers also get to sell you a new device when you want a more powerful LLM.

jus3sixty · 2025-08-06T02:29:13 1754447353

Bingo!

setopt · 2025-08-06T02:40:39 1754448039

I’m highly interested in local models for privacy reasons. In particular, I want to give an LLM access to my years of personal notes and emails, and answer questions with references to those. As a researcher, there’s lots of unpublished stuff in there that I sometimes either forget or struggle to find again due to searching for the wrong keywords, and a local LLM could help with that.

I pay for ChatGPT and use it frequently, but I wouldn’t trust uploading all that data to them even if they let me. I’ve so far been playing around with Ollama for local use.

barnabee · 2025-08-05T22:11:45 1754431905

~80% of the basic questions I ask of LLMs[0] work just fine locally, and I’m happy to ask twice for the other 20% of queries for the sake of keeping those queries completely private.

[0] Think queries I’d previously have had to put through a search engine and check multiple results for a one word/sentence answer.

benreesman · 2025-08-06T10:22:12 1754475732

"Because you can and its cool" would be reason enough: plenty of revolutions have their origin in "because you can" (Wozniak right off the top of my head, Gates and Altair, stuff like that).

But uncensored is a big deal too: censorship is capability reducing (check out Kilcher's GPT4Chan video and references, the Orca work and Dolphin de-tune lift on SWE-Bench style evals). We pay dearly in capability to get "non-operator-alignment", and you'll notice that competition is hot enough now that at the frontier (Opus, Qwen) the " alignment" away from operators aligned is getting very, very mild.

And then there's the compression. Phi-3 or something works on a beefy laptop and has a nontrivial approximation of "the internet" that works on an airplane or a beach with no network connectivity, talk about vibe coding? I like those look up all the docs via a thumbdrive in Phuket vibes.

And on diffusion stuff, SOTA fits on a laptop or close, you can crush OG mid journey or SD on a macbook, its an even smaller gap.

Early GPT-4 ish outcomes are possible on a Macbook Pro or Razer Blade, so either 12-18 month old LLMs are useless, or GGUF is useful.

The AI goalposts things cuts both ways. If AI is "whatever only Anthropic can do"? That's just as silly as "whatever a computer can't do" and a lot more cynical.

dcreater · 2025-08-05T22:36:51 1754433411

Why do any compute locally? Everything can just be cloud based right? Won't that work much better and scale easily?

We are not even at that extreme and you can already see the unequal reality that too much SaaS has engendered

kybernetikos · 2025-08-07T09:14:59 1754558099

> Won't that work much better and scale easily?

Doing computation that can happen at end points at the end points is massively more scaleable. Even better, its done by compute you usually aren't paying for if you're the company providing the service.

I saw an interview with the guy who made photopea where he talked about how tiny his costs were because all compute was done in the user's browser. Running a saas in a cloud is expensive.

It's an underrated aspect of what we used to call "software".

And that's leaving aside questions of latency and data privacy.

robwwilliams · 2025-08-06T01:02:37 1754442157

Comcast comes to mind ;-)

benreesman · 2025-08-06T13:43:04 1754487784

Real talk. I'm based in San Juan and while in general having an office job on a beautiful beach is about as good as this life has to offer, the local version of Comcast (Liberty) is juuusst unreliable enough that I'm buying real gear at both the office and home station after a decade of laptop and go because while it goes down roughly as often as Comcast, its even harder to get resolved. We had StarLink at the office for like 2 weeks, you need a few real computers lying around.

philip1209 · 2025-08-06T04:26:14 1754454374

I'm excited to do just dumb and irresponsible things with a local model, like "iterate through every single email in my 20-year-old gmail account and apply label X if Y applies" and not have a surprise bill.

I think it can make LLMs fun.

taneq · 2025-08-06T06:20:13 1754461213

I wrote a script to get my local Gemma3 insurance to tag and rename everything in my meme folder. :P

Roark66 · 2025-08-06T11:24:40 1754479480

People like myself that firmly believe there will come a time, possibly very soon that all these companies (OpenAI, Anthropic etc) will raise their prices substantially. By then no one will be able to do their work to the standard expected of them without AI, and by then maybe they charge $1k per month, maybe they charge $10k. If there is no viable alternative the sky is the limit.

Why do you think they continue to run at a loss? From the goodness of their heart? Their biggest goal is to discourage anyobe from running local models. The hardware is expensive... The way to run models is very difficult (for example I have dual rtx 3090 for vram and running large heavily quantized models is a real pain in the arse, no high quantisation library supports two GPUs for example, and there seems to be no interest in implementating it by the guys behind the best inference tools).

So this is welcome, but let's not forget why it is being done.

Gracana · 2025-08-06T13:51:47 1754488307

> no high quantisation library supports two GPUs for example, and there seems to be no interest in implementating it by the guys behind the best inference tools

I'm curious to hear what you're trying to run, because I haven't used any software that is not compatible with multiple GPUs.

jedberg · 2025-08-05T22:51:34 1754434294

Pornography, or any other "restricted use". They either want privacy or don't want to deal with the filters on commercial products.

I'm sure there are other use cases, but much like "what is BitTorrent for?", the obvious use case is obvious.

ineedasername · 2025-08-05T23:21:15 1754436075

A local laptop of the past few years without a discrete GPU can run, at practical speeds depending on task, a gemma/llama model if it's (ime) under 4GB.

For practical RAG processes of narrow scope and an even minimal amount of scaffolding a very usable speed for automating tasks, especially as the last-mile/edge device portion of a more complex process with better models in use upstream. Classification tasks, reasonay intelligent decisions between traditional workflow processes, other use cases-- a of them extremely valuable in enterprise, being built and deployed right now.

alecfong · 2025-08-06T02:44:28 1754448268

If you wanna compare on an h200 and play with trt-llm configs I setup this link here https://brev.nvidia.com/launchable/deploy?launchableID=env-3...

jona777than · 2025-08-06T05:21:44 1754457704

One of my favorite use cases includes simple tasks like generating effective mock/masked data from real data. Then passing the mock data worry-free to the big three (or wherever.)

There’s also a huge opportunity space for serving clients with very sensitive data. Health, legal, and government come to mind immediately. These local models are only going to get more capable of handling their use cases. They already are, really.

azinman2 · 2025-08-05T22:54:54 1754434494

I'm guessing its largely enthusiasts for now, but as they continue getting better:

1. App makers can fine tune smaller models and include in their apps to avoid server costs

2. Privacy-sensitive content can be either filtered out or worked on... I'm using local LLMs to process my health history for example

3. Edge servers can be running these fine tuned for a given task. Flash/lite models by the big guys are effectively like these smaller models already.

noosphr · 2025-08-05T22:54:28 1754434468

Data that can't leave the premises because it is too sensitive. There is a lot of security theater around cloud pretending to be compliant but if you actually care about security a locked server room is the way to do it.

lynnesbian · 2025-08-06T02:50:30 1754448630

I can provide a real-world example: Low-latency code completion.

The JetBrains suite includes a few LLM models on the order of a hundred megabytes. These models are able to provide "obvious" line completion, like filling in variable names, as well as some basic predictions, like realising that the `if let` statement I'm typing out is going to look something like `if let Some(response) = client_i_just_created.foobar().await`.

If that was running in The Cloud, it would have latency issues, rate limits, and it wouldn't work offline. Sure, there's a pretty big gap between these local IDE LLMs and what OpenAI is offering here, but if my single-line autocomplete could be a little smarter, I sure wouldn't complain.

mrheosuper · 2025-08-06T04:13:17 1754453597

I don't have latency issue with github copilot. Maybe i'm less sensitive to it.

danielvaughn · 2025-08-05T22:48:26 1754434106

Just imagine the next PlayStation or XBox shipping with these models baked in for developer use. The kinds of things that could unlock.

pcdoodle · 2025-08-06T01:23:52 1754443432

Good point. Take the state of the world and craft npc dialogue for instance.

danielvaughn · 2025-08-06T01:31:08 1754443868

Yep that’s my biggest ask tbh. I just imagine the next Elder Scrolls taking advantage of that. Would change the gaming landscape overnight.

okasaki · 2025-08-06T10:37:43 1754476663

Games with LLM characters have been done and it turns out this is a shit idea.

bavell · 2025-08-06T12:27:59 1754483279

There are a ton of ways to do this that haven't been tried yet.

danielvaughn · 2025-08-06T17:08:47 1754500127

I guarantee anything that’s already been put out is too early, and is very likely a rushed cash-grab. Which, of course that sucks.

And AI has been in games for a long time. Generated terrain and other sorts of automation have been used as techniques for a hot minute now.

All I’m suggesting is to keep on that same trajectory, now just using an on-device LLM to back intelligence features.

djeastm · 2025-08-06T17:54:59 1754502899

Sounds like a pre-Beatles "guitar groups are on their way out" kind of statement

dsubburam · 2025-08-06T03:05:51 1754449551

> I’m still trying to understand what is the biggest group of people that uses local AI (or will)?

Creatives? I am surprised no one's mentioned this yet:

I tried to help a couple of friends with better copy for their websites, and quickly realized that they were using inventive phrases to explain their work, phrases that they would not want competitors to get wind of and benefit from; phrases that associate closely with their personal brand.

Ultimately, I felt uncomfortable presenting the cloud AIs with their text. Sometimes I feel this way even with my own Substack posts, where I occasionally coin a phrase I am proud of. But with local AI? Cool...

flir · 2025-08-06T09:49:17 1754473757

> I tried to help a couple of friends with better copy for their websites, and quickly realized that they were using inventive phrases to explain their work, phrases that they would not want competitors to get wind of and benefit from; phrases that associate closely with their personal brand.

But... they're publishing a website. Which competitors will read. Which chatbots will scrape. I genuinely don't get it.

pickledoyster · 2025-08-07T07:47:45 1754552865

there's a difference between an internal brief and a public copy.

JKCalhoun · 2025-08-05T22:22:42 1754432562

I do it because 1) I am fascinated that I can and 2) at some point the online models will be enshitified — and I can then permanently fall back on my last good local version.

captainregex · 2025-08-05T22:30:39 1754433039

love the first and am sad you’re going to be right about the second

JKCalhoun · 2025-08-05T23:35:12 1754436912

When it was floated about that the DeepSeek model was to be banned in the U.S., I grabbed it as fast as I could.

Funny how that works.

bavell · 2025-08-06T12:32:05 1754483525

I mean, there's always torrents

JKCalhoun · 2025-08-06T18:33:58 1754505238

I expect so. Still, it was easy to not have to even think about that.

trenchpilgrim · 2025-08-05T23:29:36 1754436576

In some large, lucrative industries like aerospace many of the hosted models are off the table due to regulations such as ITAR. There'a a market for models which are run on prem/in GovCloud with a professional support contract for installation and updates.

lucumo · 2025-08-06T08:33:24 1754469204

I'm in a corporate environment. There's a study group to see if maybe we can potentially get some value out of those AI tools. They've been "studying" the issue for over a year now. They expect to get some cloud service that we can safely use Real Soon Now.

So, it'll take at least two more quarters before I can actually use those non-local tools on company related data. Probably longer, because sense of urgency is not this company's strong suit.

Anyway, as a developer I can run a lot of things locally. Local AI doesn't leak data, so it's safe. It's not as good as the online tools, but for some things they're better than nothing.

sturadnidge · 2025-08-06T01:24:36 1754443476

If you have capable hardware and kids, a local LLM is great. A simple system prompt customisation (e.g. ‘all responses should be written as if talking to a 10 year old’) and knowing that everything is private goes a long way for me at least.

itake · 2025-08-06T06:11:11 1754460671

Local micro models are both fast and cheap. We tuned small models on our data set and if the small model thinks content is a certain way, we escalate to the LLM.

This gives us really good recall at really low cloud cost and latency.

bavell · 2025-08-06T12:30:38 1754483438

I'd love to try this on my data set - what approach/tools/models did you use for fine-tuning?

itake · 2025-08-07T02:01:36 1754532096

Everything is built in-house unfortunately. Many of our small models are turned Qwen3. But we mostly chose the model on SOTA at the time we needed a model trained.

ricardobayes · 2025-08-06T13:08:09 1754485689

I would say, any company who doesn't have their own AI developed. You always hear companies "mandating" AI usage, but for the most part it's companies developing their own solutions/agents. No self-respecting company with a tight opsec would allow a random "always-online" LLM that could just rip your codebase either piece by piece or the whole thing at once if it's a IDE addon (or at least I hope that's the case). So yeah, I'd say locally deployed LLM's/Agents are a gamechanger.

seany · 2025-08-06T00:23:48 1754439828

Jail breaking then running censored questions. Like diy fireworks, or analysis of papers that touch "sensitive topics", nsfw image generation the list is basically endless.

jlokier · 2025-08-06T11:40:49 1754480449

At the company where I currently work, for IP reasons (and with the advice of a patent lawyer), nobody is allowed to use any online AIs to talk about or help with work, unless it's very generic research that doesn't give away what we're working on.

That rules out coding assistants like Claude, chat, tools to generate presentations and copy-edit documents, and so forth.

But local AI are fine, as long as we're sure nothing is uploaded.

dismalaf · 2025-08-06T03:50:48 1754452248

The use case is building apps.

A small LLM can do RAG, call functions, summarize, create structured data from messy text, etc... You know, all the things you'd do if you were making an actual app with an LLM.

Yeah, chat apps are pretty cheap and convenient for users who want to search the internet and write text or code. But APIs quickly get expensive when inputting a significant amount of tokens.

athrowaway3z · 2025-08-06T13:47:21 1754488041

Don't know about the biggest, but IMO the exciting things about open models is the possibility of creating whole new things.

For example, "generate a heatmap of each token/word and how 'unexpected' they are" or "find me a prompt that creates the closest match to this text"

To be efficient both require access that is not exposed over API.

adrianwaj · 2025-08-05T22:39:27 1754433567

Use Case?

How about running one on this site but making it publically available? A sort of outranet and calling it HackerBrain?

georgeecollins · 2025-08-06T04:28:55 1754454535

There's a bunch of great reasons in this thread, but how about the chip manufacturers that are going to need you to need a more powerful set of processors in your phone, headset, computer. You can count on those companies to subsidize some R&D and software development.

cameronh90 · 2025-08-05T23:18:16 1754435896

The cloud AI providers have unacceptable variation in response time for things that need a predictable runtime.

Even if they did offer a defined latency product, you’re relying on a lot of infrastructure between your application and their GPU.

That’s not always tolerable.

sznio · 2025-08-06T06:14:07 1754460847

>Students who don’t want to pay but somehow have the hardware?

that's me - well not a student anymore. when toying with something, i much prefer not paying for each shot. my 12GB Radeon card can either run a decent extremely slow, or a idiotic but fast model. it's nice not dealing with rate limits.

once you write a prompt that mangles an idiotic model into still doing the work, it's really satisfying. the same principle as working to extract the most from limited embedded hardware. masochism, possibly

yreg · 2025-08-06T14:52:00 1754491920

> I’m still trying to understand what is the biggest group of people that will use local AI?

iPhone users in a few months – because I predict app developers will love cramming calls to the foundation models into everything.

Android will follow.

etoxin · 2025-08-06T06:14:51 1754460891

Some app devs use local models on local environments with LLM APIs to get up and running fast, then when the app deploys it switches to the big online models via environment vars.

In large companies this can save quite a bit of money.

m463 · 2025-08-05T23:01:13 1754434873

One use nobody mentions is hybrid use.

Why not run all the models at home, maybe collaboratively or at least in parallel?

I'm sure there are use cases where the paid models are not allowed to collaborate or ask each other.

also, other open models are gaining mindshare.

muzani · 2025-08-06T06:15:38 1754460938

Privacy laws. Processing government paperwork with LLMs for example. There's a lot of OCR tools that can't be used, and the ones that comply are more expensive than say, GPT-4.1 and lower quality.

waterTanuki · 2025-08-06T23:35:12 1754523312

anything involving the medical industry (HIPAA laws), national security (FedRAMP is such a pita to get that some military contractors are bypassing it to get quicker access to cloud tools) etc.

Besides that, we are moving towards an era where we won't need to pay providers a subscription every month to use these models. I can't say for certain whether or not the GPUs that run them will get cheaper, but the option to run your own model is game changing for more than you can possibly imagine.

TrackerFF · 2025-08-06T07:56:06 1754466966

Agencies / firms that work with classified data. Some places have very strict policies on data, which makes it impossible to use any service that isn't local and air-gapped.

example: military intel

cyanydeez · 2025-08-06T00:55:58 1754441758

People who want programmatic solutions that wont be rug pulled

m3kw9 · 2025-08-06T00:05:46 1754438746

I’d use it on a plane if there was no network for coding, but otherwise it’s just an emergency model if the internet goes out, basically end of the world scenarios

julianozen · 2025-08-06T01:03:22 1754442202

worth mentioning that todays expensive hardware will be built into the cheapest iPhone in less than 10 years.

That means running instantly offline and every token is free

nfRfqX5n · 2025-08-06T00:26:32 1754439992

You’re asking the biggest group of people who would want to do this

unethical_ban · 2025-08-05T22:17:18 1754432238

Privacy and equity.

Privacy is obvious.

AI is going to to be equivalent to all computing in the future. Imagine if only IBM, Apple and Microsoft ever built computers, and all anyone else ever had in the 1990s were terminals to the mainframe, forever.

captainregex · 2025-08-05T22:25:45 1754432745

I am all for the privacy angle and while I think there’s certainly a group of us, myself included, who care deeply about it I don’t think most people or enterprises will. I think most of those will go for the easy button and then wring their hands about privacy and security as they have always done while continuing to let the big companies do pretty much whatever they want. I would be so happy to be wrong but aren’t we already seeing it? Middle of the night price changes, leaks of data, private things that turned out to not be…and yet!

robwwilliams · 2025-08-06T01:05:13 1754442313

I wring my hands twice a week about internet service providers; Comcast and Starlink. And I live in a nominally well serviced metropolitan area.

xigoi · 2025-08-08T09:16:40 1754644600

> AI is going to to be equivalent to all computing in the future.

Thanks, but I prefer my computing to be deterministic if at all possible.

bavell · 2025-08-06T12:36:23 1754483783

Did you mean to type equality? As in, "everyone on equal footing"? Otherwise, I'm not sure how to parse your statement.