I more meant that it could maybe see a significant difference in the road, and know to take caution, not to try to gauge the depth of a submerged roadway.
- media/tech shortened content: shorter tv shows, short video content, etc.
(Tiktok is the "state of the art" of those 2 trends pushed to the max)
Specifically, we're getting more & more addicted to things that increase the dopamine spikes frequency, making it increasingly difficult to go in deep focus work.
Absolutely, we are feeding kids so much attention-span killing things. Even as an adult I'm having hard time with YouTube shorts, and i cannot imagine a kids brain having the ways to deal with all that.
I am _fighting_ with elderly relatives to adjust their YouTube habits. They didn't even know it comes on autopilot by default. They don't even check sources, they just let the garbage in.
I wonder if there's research on short form but educational content or if that's fundamentally impossible.
For example I remember reading a lot of science magazines / articles growing up (granted popsci but for a kid it still teaches some things) and as I grew up things like the Economist.
Similarly I also played games like math blaster as a kid and have realized I need to intentionally provide games like this to my kids that ideally teach something (the bar being greater than zero learning) rather than playing one of those infinite running games or whatever.
I think we're probably talking about the exact same thing but am curious where content vs. short form media is.
That doesn't line up though. See if you're 13 and meeting the level in 2012 your scores don't decline. So the levels would lag a few years. The 8 year-olds show up and miss the mark in 2017 that indicate the infinite scroll problem was having a toll on them.
Additionally this would start to show in class specific measurements (those kids with access to home internet, personal devices, etc. would have worse scores). I think the argument about social media has merit in discussion of children, but it seems more of a social distinction rather than an objective indicator for academic performance.
But isn't the prefill speed the bottleneck in some systems* ?
Sure it's order of magnitude faster (10x on Apple Metal?) but there's also order of magnitude more tokens to process, especially for tasks involving summarization of some sort.
But point taken that the parent numbers are probably decode
* Specifically, Mac metal, which is what parent numbers are about
oMLX makes prefill effectively instantaneous on a Mac.
Storing an LRU KV Cache of all your conversations both in memory, and on (plenty fast enough) SSD, especially including the fixed agent context every conversation starts with, means we go from "painfully slow" to "faster than using Claude" most of the time. It's kind of shocking this much perf was lying on the ground waiting to be picked up.
Open models are still dumber than leading closed models, especially for editing existing code. But I use it as essentially free "analyze this code, look for problem <x|y|z>" which Claude is happy to do for an enormous amount of consumed tokens.
But speed is no longer a problem. It's pretty awesome over here in unified memory Mac land :)
Some might be tempted to brush aside that Server Linux threat model is very different from Desktop Linux (to snarkily reply "we'll it's powering a vast majority of GDP via all of AWS, Azure, etc.").
However comparing apples to apples, what makes you say this isn't ready for government usage, when it's ready for trillion dollar big tech companies' majority of their workforce? (Aside from Microsoft, Apple obviously). Large employers like IBM etc also must be using red hat or some other distro
Google for example uses a fork of Ubuntu. When someone decided to compromise Google employees machines via a fake npm package they were able to do so successfully. When they reported this to Google they said it was okay for employee machines to be compromised and that it was part of Google's threat model. While this may be true for large companies I don't think the French government is ready to handle such a security model.
> I don't know how to force this issue as a European. There are just too many levels of abstraction between me and Brussels.
> EU moves so much faster when it comes to regulations like forcing all of us in Denmark to use timesheets, annoying lids on our bottles, and invasive surveillance laws.
Rediscovering the principle of subsidiarity from first principles...
> I'll need to investigate further but it doesn't seem promising.
That's what I meant by "waiting a few days for updates" in my other comment. Qwen 3.5 release, I remember a lot of complaints about: "tool calling isn't working properly" etc.
That was fixed shortly after: there was some template parsing work in llama.cpp. and unsloth pulled out some models and brought back better one for improving something else I can't quite remember, better done Quantization or something...
The model does call tools successfully giving sensible parameters but it seems to not picking the right ones in the right order.
I'll try in a few days. It's great to be able to test it already a few hours after the release. It's the bleeding edge as I had to pull the last from main. And with all the supply chain issues happening everywhere, bleeding edge is always more risky from a security point of view.
There is always also the possibility to fine-tune the model later to make sure it can complete the custom task correctly. But the code for doing some Lora for gemma4 is probably not yet available. The 50% extra speed seems really tempting.
(Comparing Q3.5-27B to G4 26B A4B and G4 31B specifically)
I'd assume Q3.5-35B-A3B would performe worse than the Q3.5 deep 27B model, but the cards you pasted above, somehow show that for ELO and TAU2 it's the other way around...
Very impressed by unsloth's team releasing the GGUF so quickly, if that's like the qwen 3.5, I'll wait a few more days in case they make a major update.
Overall great news if it's at parity or slightly better than Qwen 3.5 open weights, hope to see both of these evolve in the sub-32GB-RAM space. Disappointed in Mistral/Ministral being so far behind these US & Chinese models
That Pareto plot doesn't seem include the Gemma 4 models anywhere (not just not at the frontier), likely because pricing wasn't available when the chart was generated. At least, I can't find the Gemma 4 models there. So, not particularly relevant until it is updated for the models released today.
Gemma 4 31B has now wiped out several of those models from the pareto frontier, now that it has pricing. Gemma 4 26B A4B has an Elo, but no pricing, so it still isn't on that chart. The Gemma 4 E2B/E4B models still aren't on the arena at all, but I expect them to move the pareto frontier as well if they're ever added, based on how well they've performed in general.
> Very impressed by unsloth's team releasing the GGUF so quickly, if that's like the qwen 3.5, I'll wait a few more days in case they make a major update.
Same here. I can't wait until mlx-community releases MLX optimized versions of these models as well, but happily running the GGUFs in the meantime!
absolute n00b here is very confused about the many variations; it looks like the Mac optimized MX versions aren’t available in Ollama yet (I mostly use claude code with this)
the benchmarks showing the "old" Chinese qwen models performing basically on par with this fancy new release kinda has me thinking the google models are DOA no? what am I missing?
You think you know how deep it is under because you've taken that road many times before (or in your case you have historical laser measurement)
But you don't know:
- Maybe the road under fully collapsed
- Maybe the flow of water is extremely strong, so you need to accurately estimate that too.
reply