Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Impact illustration:

> [...] the contents of the entire memory to be read over time, explains Rüegge. “We can trigger the error repeatedly and achieve a readout speed of over 5000 bytes per second.” In the event of an attack, therefore, it is only a matter of time before the information in the entire CPU memory falls into the wrong hands.



Prepare for another dive maneuver in the benchmarks department I guess.


We need software and hardware to cooperate on this. Specifically, threads from different security contexts shouldn't get assigned to the same core. If we guarantee this, the fences/flushes/other clearing of shared state can be limited to kernel calls and process lifetime events, leaving all the benefits of caching and speculative execution on the table for things actually doing heavy lifting without worrying about side channel leaks.


I get you, but devs struggle to configure nginx to serve their overflowing cauldrons of 3rd party npm modules of witches incantations. Getting them securely design and develop security labelled cgroup based micro (nano?) compute services for inferencing text of various security levels is beyond even 95% of coders. I'd posit that it would be a herculean effort even for 1% devs.

Just fix the processors?


It's not a "just" if the fix cripples performance; it's a tradeoff. It is forced to hurt everything everywhere because the processor alone has no mechanism to determine when the mitigation is actually required and when it is not. It is 2025 and security is part of our world; we need to bake it right into how we think about processor/software interaction instead of attempting to bolt it on after the fact. We learned that lesson for internet facing software decades ago. It's about time we learned it here as well.


Is the juice worth the squeeze? Not everything needs Orange Book (DoD 5200.28-STD) Class B1 systems.


how will this prevent JavaScript from leaking my password manager database?


And if not, why did they introduce severe bugs for a tiny performance improvement?


It's not tiny. Speculative execution usually makes code run 10-50% faster, depending on how many branches there are


Yeah… folks who think this is just some easy to avoid thing should go look around and find the processor without branch prediction that they want to use.

On the bright side, they will get to enjoy a much better music scene, because they’ll be visiting the 90’s.


> Does Branch Privilege Injection affect non-Intel CPUs?

> No. Our analysis has not found any issues on the evaluated AMD and ARM systems.


IBM Stretch had branch prediction. Pentium in the early 1990s had it. It's a huge win with any pipelining.


That's a vast underestimate. Putting in lfence before every branch is on the order of 10X slowdown.


There is of course a slight chicken-egg-thing here: If there was no (dynamic) branch prediction, we (as in compilers) would emit different code that is faster for non-predicting CPUs (and presumably slower for predicting CPUs). That would mitigate a bit of that 10x.


A bit. I think we've shown time and time again that letting the compiler do what the CPU is doing doesn't work out, most recently with Itanium.


The issue is with indirect branches. Most branches are direct ones.


Of course I know that.

But if the fix for this bug (how many security holes have ther been now in Intel CPUs? 10?) brings only a couple % performance loss, like most of the them so far, how can you even justify that at all? Isn't there a fundamental issue in there?


How much improvement would there still be if we weren't so lazy when it comes to writing software. If we were working to get as much performance out of the machines as possible and avoiding useless bloat instead of just counting on the hardware to be "good enough" to handle the slowness with some grace.


A modern processor pipeline is dozens of cycles deep. Without branch prediction, we would need to know the next instruction at all times before beginning to fetch it. So we couldn’t begin fetching anything until the current instruction is decoded and we know it’s not a branch or jump. Even more seriously, if it is a branch, we would need to stall the pipeline and not do anything until the instruction finishes executing and we know whether it’s taken or not (possibly dozens of cycles later, or hundreds if it depends on a memory access). Stalling for so many cycles on every branch is totally incompatible with any kind of modern performance. If you want a processor that works this way, buy a microcontroller.


But branch prediction doesn't necessarily need complicated logic. If I remember correctly (it's been 20 years since I read any papers on it), the simple heuristic "all relative branches backwards are taken, but forward and absolute branches are not" could achieve 70-80% performance of the state-of-the-art implementations back then.


Do you mean overall or localized to branch prediction? Assuming all of that is true, you're talking about a 20-30% performance hit?


> If you want a processor that works this way, buy a microcontroller.

The ARM Cortex-R5F and Cortex-M7, to name a few, have branch predictors as well, for what it’s worth ;)


You can still have a static branch predictor. That has surprisingly good coverage. I'm not saying this is a great idea, just pointing it out.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: