Hacker Newsnew | past | comments | ask | show | jobs | submit | kpw94's commentslogin

That seems a very risky assumption for any car (self driving or human driver) during flash floods. "Turn around don't drown":

You think you know how deep it is under because you've taken that road many times before (or in your case you have historical laser measurement)

But you don't know:

- Maybe the road under fully collapsed

- Maybe the flow of water is extremely strong, so you need to accurately estimate that too.


I more meant that it could maybe see a significant difference in the road, and know to take caution, not to try to gauge the depth of a submerged roadway.

Flow should be able to be done with vision, radar can as well: some bridges use surface flow monitoring radar.

> And the author is correct (while the phrasing is a bit weird.)

Right, that's just a description of the https://en.wikipedia.org/wiki/Baumol_effect


Speculative execution techniques in software & hardware exist everywhere,

- Speculative multi threading

- Data Value Speculation

- Speculative Memory Disambiguation

- Runahead Execution

- Speculative Prefetching

- Multi-path (Dual-path) Execution (goes beyond branch prediction by computing both paths)

- Optimistic Concurrency Control (for database transactions etc)


My non-controversial theory: It's all the attention-span-shortening stuff.

- tech apps starting with infinite scroll (facebook, 9gag, Instagram, etc.)

- media/tech shortened content: shorter tv shows, short video content, etc.

(Tiktok is the "state of the art" of those 2 trends pushed to the max)

Specifically, we're getting more & more addicted to things that increase the dopamine spikes frequency, making it increasingly difficult to go in deep focus work.


Absolutely, we are feeding kids so much attention-span killing things. Even as an adult I'm having hard time with YouTube shorts, and i cannot imagine a kids brain having the ways to deal with all that.


I am _fighting_ with elderly relatives to adjust their YouTube habits. They didn't even know it comes on autopilot by default. They don't even check sources, they just let the garbage in.


Also the parents using too many attention-span killing things which is hurting the attention they give their children.


I wonder if there's research on short form but educational content or if that's fundamentally impossible.

For example I remember reading a lot of science magazines / articles growing up (granted popsci but for a kid it still teaches some things) and as I grew up things like the Economist.

Similarly I also played games like math blaster as a kid and have realized I need to intentionally provide games like this to my kids that ideally teach something (the bar being greater than zero learning) rather than playing one of those infinite running games or whatever.

I think we're probably talking about the exact same thing but am curious where content vs. short form media is.

Thanks for sharing :)


Last time I dove into its research, I found that Math Blaster had no impact on student learning.


That doesn't line up though. See if you're 13 and meeting the level in 2012 your scores don't decline. So the levels would lag a few years. The 8 year-olds show up and miss the mark in 2017 that indicate the infinite scroll problem was having a toll on them. Additionally this would start to show in class specific measurements (those kids with access to home internet, personal devices, etc. would have worse scores). I think the argument about social media has merit in discussion of children, but it seems more of a social distinction rather than an objective indicator for academic performance.


I certainly feel several degrees dumber than I did as a teenager without that stuff


When you say tok/s here are you describing the prefill (prompt eval) token/s or the output generation tok/s?

(Btw I believe the "--jinja" flag is by default true since sometime late 2025, so not needed anymore)


Here is llama-bench on the same M4:

  | model                    |       size |     params | backend    | threads |            test |                  t/s |
  | ------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: |
  | qwen35 27B Q4_K_M        |  15.65 GiB |    26.90 B | BLAS,MTL   |       4 |           pp512 |         61.31 ± 0.79 |
  | qwen35 27B Q4_K_M        |  15.65 GiB |    26.90 B | BLAS,MTL   |       4 |           tg128 |          5.52 ± 0.08 |
  | qwen35moe 35B.A3B Q3_K_M |  15.45 GiB |    34.66 B | BLAS,MTL   |       4 |           pp512 |        385.54 ± 2.70 |
  | qwen35moe 35B.A3B Q3_K_M |  15.45 GiB |    34.66 B | BLAS,MTL   |       4 |           tg128 |         26.75 ± 0.02 |
So ~60 for prefill and ~5 for output on 27B and about 5x on 35B-A3B.


If someone doesn't specifically say prefill then they always mean decode speed. I have never seen an exception. Most people just ignore prefill.


But isn't the prefill speed the bottleneck in some systems* ?

Sure it's order of magnitude faster (10x on Apple Metal?) but there's also order of magnitude more tokens to process, especially for tasks involving summarization of some sort.

But point taken that the parent numbers are probably decode

* Specifically, Mac metal, which is what parent numbers are about


Yes, definitely it's the bottleneck for most use cases besides "chatting". It's the reason I have never bought a Mac for LLM purposes.

It's frustrating when trying to find benchmarks because almost everyone gives decode speed without mentioning prefill speed.


oMLX makes prefill effectively instantaneous on a Mac.

Storing an LRU KV Cache of all your conversations both in memory, and on (plenty fast enough) SSD, especially including the fixed agent context every conversation starts with, means we go from "painfully slow" to "faster than using Claude" most of the time. It's kind of shocking this much perf was lying on the ground waiting to be picked up.

Open models are still dumber than leading closed models, especially for editing existing code. But I use it as essentially free "analyze this code, look for problem <x|y|z>" which Claude is happy to do for an enormous amount of consumed tokens.

But speed is no longer a problem. It's pretty awesome over here in unified memory Mac land :)


Right, they're not the only FAANG company for which we know they're doing it: https://news.ycombinator.com/item?id=46318494


Some might be tempted to brush aside that Server Linux threat model is very different from Desktop Linux (to snarkily reply "we'll it's powering a vast majority of GDP via all of AWS, Azure, etc.").

However comparing apples to apples, what makes you say this isn't ready for government usage, when it's ready for trillion dollar big tech companies' majority of their workforce? (Aside from Microsoft, Apple obviously). Large employers like IBM etc also must be using red hat or some other distro


Google for example uses a fork of Ubuntu. When someone decided to compromise Google employees machines via a fake npm package they were able to do so successfully. When they reported this to Google they said it was okay for employee machines to be compromised and that it was part of Google's threat model. While this may be true for large companies I don't think the French government is ready to handle such a security model.


> that it was part of Google's threat model

That's just PR to avoid stocks going down.


> I don't know how to force this issue as a European. There are just too many levels of abstraction between me and Brussels.

> EU moves so much faster when it comes to regulations like forcing all of us in Denmark to use timesheets, annoying lids on our bottles, and invasive surveillance laws.

Rediscovering the principle of subsidiarity from first principles...


> I'll need to investigate further but it doesn't seem promising.

That's what I meant by "waiting a few days for updates" in my other comment. Qwen 3.5 release, I remember a lot of complaints about: "tool calling isn't working properly" etc.

That was fixed shortly after: there was some template parsing work in llama.cpp. and unsloth pulled out some models and brought back better one for improving something else I can't quite remember, better done Quantization or something...

coder543 pointed out the same is happening regarding tool calling with gemma4: https://news.ycombinator.com/item?id=47619261


The model does call tools successfully giving sensible parameters but it seems to not picking the right ones in the right order.

I'll try in a few days. It's great to be able to test it already a few hours after the release. It's the bleeding edge as I had to pull the last from main. And with all the supply chain issues happening everywhere, bleeding edge is always more risky from a security point of view.

There is always also the possibility to fine-tune the model later to make sure it can complete the custom task correctly. But the code for doing some Lora for gemma4 is probably not yet available. The 50% extra speed seems really tempting.


Wild differences in ELO compared to tfa's graph: https://storage.googleapis.com/gdm-deepmind-com-prod-public/...

(Comparing Q3.5-27B to G4 26B A4B and G4 31B specifically)

I'd assume Q3.5-35B-A3B would performe worse than the Q3.5 deep 27B model, but the cards you pasted above, somehow show that for ELO and TAU2 it's the other way around...

Very impressed by unsloth's team releasing the GGUF so quickly, if that's like the qwen 3.5, I'll wait a few more days in case they make a major update.

Overall great news if it's at parity or slightly better than Qwen 3.5 open weights, hope to see both of these evolve in the sub-32GB-RAM space. Disappointed in Mistral/Ministral being so far behind these US & Chinese models


You're conflating lmarena ELO scores.

Qwen actually has a higher ELO there. The top Pareto frontier open models are:

  model                        |elo  |price
  qwen3.5-397b-a17b            |1449 |$1.85
  glm-4.7                      |1443 | 1.41
  deepseek-v3.2-exp-thinking   |1425 | 0.38
  deepseek-v3.2                |1424 | 0.35
  mimo-v2-flash (non-thinking) |1393 | 0.24
  gemma-3-27b-it               |1365 | 0.14
  gemma-3-12b-it               |1341 | 0.11
  gpt-oss-20b                  |1318 | 0.09
  gemma-3n-e4b-it              |1318 | 0.03
https://arena.ai/leaderboard/text?viewBy=plot

What Gemma seems to have done is dominate the extreme cheap end of the market. Which IMO is probably the most important and overlooked segment


That Pareto plot doesn't seem include the Gemma 4 models anywhere (not just not at the frontier), likely because pricing wasn't available when the chart was generated. At least, I can't find the Gemma 4 models there. So, not particularly relevant until it is updated for the models released today.


Gemma 4 31B has now wiped out several of those models from the pareto frontier, now that it has pricing. Gemma 4 26B A4B has an Elo, but no pricing, so it still isn't on that chart. The Gemma 4 E2B/E4B models still aren't on the arena at all, but I expect them to move the pareto frontier as well if they're ever added, based on how well they've performed in general.


> Wild differences in ELO compared to tfa's graph

Because those are two different, completely independent Elos... the one you linked is for LMArena, not Codeforces.


> Very impressed by unsloth's team releasing the GGUF so quickly, if that's like the qwen 3.5, I'll wait a few more days in case they make a major update.

Same here. I can't wait until mlx-community releases MLX optimized versions of these models as well, but happily running the GGUFs in the meantime!

Edit: And looks like some of them are up!


absolute n00b here is very confused about the many variations; it looks like the Mac optimized MX versions aren’t available in Ollama yet (I mostly use claude code with this)


the benchmarks showing the "old" Chinese qwen models performing basically on par with this fancy new release kinda has me thinking the google models are DOA no? what am I missing?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: