Hacker Newsnew | past | comments | ask | show | jobs | submit | bri3d's commentslogin

AMD Software Engineers giving AMD Stupid Gaming Accessory Software Engineers access to a signing system backed by PSP seems like a much worse outcome than trusting HTTPS, really. Like, there are definitely intelligent and secure ways to do this, but this one in particular is overkill with a huge blast radius when it is (invariably) done incorrectly.

Actual write-up rather than overwrought YouTube drama: https://mrbruh.com/amd2/

A non-default-installation set of AMD tools (Ryzen Master and probably others) had an auto-updater which used HTTP instead of HTTPS. It's clear this is a feature they'd basically forgotten about; it even pointed to an ATI domain. A third-party bug bounty company rejected it because MITM was out of scope. AMD are incompetent at making software (news at 11), kept asking for extensions, and took an incredible amount of time to deal with it. Eventually they removed this updater entirely and replaced it with one in the app (rather than the installer) that uses HTTPS + a CRC32 (for some reason). The initial vuln was very stupid and should have been fixed faster. As for the current system, if you're mad about HTTPS-protected auto-updaters (which is valid), you've probably got a lot of them to go to war against.


Why does everyone focus on this aspect? Why is this surprising? Do people think that 100% or even 20% of Twitter employees were SREs? Do you think that most large applications are kept alive by constant manual toil from SREs? (ok, ok some are - but still!)

What's funny is that Twitter SRE used to be horrible and the app probably would have collapsed entirely (rather than the little bit that it did) without hundreds of manual operators, but in the few years leading up to the "acquisition," massively improved to the point that they literally automated themselves out of a job.

Anyway, Twitter had thousands of engineers, salespeople, support people, and so on. They were working on tens of new products in an attempt to find more revenue (everything from clones of every single social media app you can imagine to becoming a sports TV network), and on the other side (Goldbird), selling and supporting ads, the thing that made Twitter money.

The metric to look at isn't uptime, it makes no sense that people keep parading this metric. The metrics are revenue and revenue growth and surprise! by most available metrics, the Elon strategy torpedoed those.

Twitter was, like almost every "web" company in ~2020, a very "fat" company because they were re-investing free ZIRP money in future growth investment. Elon turned it into a KTLO operation, and didn't even manage to succeed at the standard PE style "fat" company slim-down (where you chop growth initiatives and keep the revenue, like everyone else is doing now), because he also chopped the revenue side.


I've always wanted my own VW diagnostic tool suite, and between tooling that was released in public on GitHub (https://github.com/kartoffelpflanze/ODIS-project-explorer) and my own research from years ago, it always seemed straightforward but too tedious to execute on. Claude did a great job making something useful, https://github.com/bri3d/mcd-diag-rs , and now I don't have to find a Windows machine or remember a specific diagnostic cable to replace my brake pads.

I also build a ton of household glue stuff; I was never really passionate enough about the whole "homeserver" thing to spend the effort in going beyond basic video recording for my security system, but now I have all of my local-only home automation stuff wired together, mostly into HomeKit, and have been able to ditch a ton of cloud services.


elaborate on the home automation pls

I used the usual stack of HomeAssistant, Frigate, and a bazillion glue connectors to consolidate devices that were previously cloud apps on my phone.

For example, previously I used the Frigate web UI over VPN to access my cameras, but I instead had Claude help set up go2rtc to push the video to HomeKit. I used the Eufy app for my doorbell, which I was instead able to integrate into HomeAssistant and then push to HomeKit as well, and I was also able to integrate some TP-Link Kasa plugs with HomeAssistant too.

This is all stuff I could have done relatively easily myself, but I hate nothing more than wading through horribly documented disparate configuration systems (I do that enough all day), so having Claude to do it for me is what unlocked it actually getting done.

I also added on to some custom one-offs like an ESP32-based controller for my kid's RGB LED nightlight for fun, although that stuff was more hand-coded since I find it enjoyable.


LLM rant aside, I think this comment is built on a fundamental misunderstanding of what this is. This kind of runtime is an advanced debugging tool for exploring extreme edge cases in a repeatable way, not the thing you’d run your production apps on.

I mean, if I'm in a situation in which I need more powerful debugging tools than what I already have available, then I kind of want to know those tools have been built with care. If I'm deep in the weeds of trying to deliver and I'm desperately reaching for something outside of the standard tool set, I'd like the person who made that tool to have crafted it in a dwarven forge under a pale moonlight. Not to have thrown to their hands and admit a tool with the intelligence of a smart working dog breed is smarter than them.

I mean, there is a reason the MIT licence contains these words:

> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND… INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF… FITNESS FOR A PARTICULAR PURPOSE…

If you would like a tool built with my organic artisanal human fingers, then I am certainly open to sufficiently large offers of money to build one for you! Alternatively, you can simply not use it if you think it won't fit your needs :)


It’s a post-boot authentication bypass exploit. Any post-boot authentication bypass exploit against TPM-only sealed BitLocker effectively bypasses it. The user doesn’t have a key to start with in this setup, just the machine.

This exploit is cool but there are similar exploits discovered in any given year and nothing really reeks of a backdoor; this one seems to be gaining attention mostly because Microsoft’s robo-call level initial response caused the researcher to dramatically crash out.


For a hobbyist, they're quite nice for "I want to SSH into a thing, write my tools on Linux, and still have access to SPI / I2C / GPIO, USB, and whatever hats can plug into that." The Hat form factor, while not technically great and frequently overpriced, is also nice for distribution; both inside a team commercially and on the web as a hobbyist, it's a lot easier to share software and say "hey, buy this pi, this hat, and run this" than "fab this PCB / solder this bundle of nonsense to an ESP."

For a company, they're also nice for "I want to make an IoT device that's heavier weight than an ESP32 and/or I only want to hire Linux Application People and not Firmware People; what's the cheapest Linux module I can get that's widely supported, backed by a real company, and has regulatory approvals" - Pi Zero W. My understanding is this exact pattern is why it's harder to get them as a hobbyist.

I use them widely in automotive reverse engineering; ESP32 can and does work just as well or better for an end product, but for experiments it's really nice to have a self-contained appliance to SSH into and use SocketCAN on rather than some bespoke firmware project to manage and iterate on.

Given the price and availability issues I suspect the market is "correcting" a bit and companies are hiring Firmware People and switching to true MCUs in places they'd previously have avoided doing so, but it was definitely a thing for a long time.


Yes? It is regularly; both the firmware or the OS can deliver updates depending on configuration. The Raptor Lake CPUs in question have gone through an enormous number of microcode revisions already due to quite famous voltage scaling issues; it's unclear if this errata is fallout from or related to a similar root cause or just another issue with the processor.


There's another blog post going into more depth about the issue here: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in... where they speculate that it seems to relate to both other clock-related instability on specific Raptor Lake parts and possibly the overarching voltage control problems that the platform had early on; I can't tell entirely from the bug reports whether the behavior reliably reproduces on 100% of Raptor Lakes but the indicators I'm reading point to that it doesn't. It is concerning that Intel didn't get back to Mozilla about it though, since it's certainly a lot more than a one off.


Linked in the Bugzilla thread is a really nice in depth investigation of the same issue with high register aliases in a similar algorithm (Huffman coding) but in an entirely different product: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in... .

It's concerning that Intel don't seem to have been responsive to anyone with respect to this issue and it doesn't appear to have an official errata yet, although Raptor Lake was the Intel CPU with voltage issues and basically random bit rot so I suppose it's hard to tell if this is a silicon level errata caused by bad design or by some kind of post-manufacturing damage. Raptor Lake in general causes enough non-reproducible noise that I believe Firefox gave up on automated crash reports from it ( https://bugzilla.mozilla.org/show_bug.cgi?id=1975808 ).

EDIT: I read that Oodle article (which is SO good!) again and realized that their customer-provided reproduction of the bug was directly linked to boost clock speeds (the customer said that overclocking by 5% made it happen entirely reliably), so this is definitely not a "the architecture has a 100% bug in it" but rather some deeper issue with clock propagation that appears at edge cases.


Read the Oodle article in full, fantastic investigation indeed!

It also looks like there's a slight difference in the unwanted effect both companies have reported, despite the bug being seemingly triggered the same way (mov touching the high byte):

- Oodle reports that a low byte is occasionally stored in the intended location.

- Mozilla's fix suggests that a full 16-bit value is stored instead, corrupting an adjacent variable! This could have much more serious consequences.

Technically, this could still be the same exact bug. I found no mention of the order the output buffer was accessed in by the Huffman decoder debugged in the Oodle report, and, since it was a contiguous buffer, it's easy to mistake an occasional out-of-bounds copy there for a copy from a wrong location. But if both analyses are correct, the behavior of high byte accesses on Raptor Lake is way less predictable than those fixes suggest. Haven't managed to find an official erratum from Intel.


It's very interesting because my 13900K has worked like a dream from day one and still to this day. Never had any of the voltage issues, never had any abnormal crashes in Firefox or any other software. I was undervolting it for a long while, so I wonder if somehow that saved me from the voltage issues before they were fixed?


I remember Puget systems pointed to this same thing when they analyzed the issues back in the day when it was blowing up.

https://www.pugetsystems.com/blog/2024/08/02/puget-systems-p...


Undervolting would definitely help, and is the actual fix. The current Intel fixes were mostly just for the symptoms, as the main issue is high voltage+power when pushing high clocks, but they can't actually fix that as it'd downgrade the advertised clocks the cpus were sold with


Sorry, but that understanding is dangerously incomplete. You're describing the first set of issues they uncovered, but there's also:

"Microcode and BIOS code requesting elevated core voltages which can cause Vmin shift especially during periods of idle and/or light activity" (emphasis mine)

https://community.intel.com/t5/Blogs/Tech-Innovation/Client/...

Recall also that "Vmin shift" means "the minimum voltage the processor needs to run correctly goes up" so if the issue isn't addressed, that level of undervolt may stop working


Not sure what's supposed to be wrong with that? The clock tree degrades at high voltage. Some theories I've seen were on the CPU requesting significantly higher voltages during alternating clocks when there's a short lull in load from e.g. a pipeline stall. Then there doesn't seem to be a good enough of a sensor net in the correct places for the CPU to react to this, so it just "burns" itself down gradually. Assuming these are true, actual fixes from intel would be relaxing boost clocks to ones that are universally safe and open themselves to a lawsuit from everyone that bought the high end SKUs, or do a new stepping which is extremely expensive for a done design.

When you degrade the CPU naturally needs higher voltages to be stable, until the point where it just breaks completely and no amount of voltage it help it. But if your CPU doesn't degrade because it hasn't been overdoing it on voltages then there'll be no issues for Vmin to shift.

As an anecdotal experience from someone I know that runs these in prod for game servers, limiting the CPU to 80°C and 1.4V-1.45V, 400A has been keeping them alive for years doing 24/7 loads. Maybe a bit lower on the voltage if one wants to be sure longer term, as they are fine with just mass RMAing these. There's also large amount of differences in the silicon quality between samples that can make one run cool and completely fine even at the old stock settings, and an another sample that'll have to pull say 1.5x the power for the same load and clocks having it degrade.


You're implying that if you don't run the CPU at high power and high heat it won't have problems, and that undervolting or underclocking will prevent damage. This is not correct: while that is helpful, Vmin degradation occurs during idle or light activity as well

Vmin will creep up, and the headroom for undervolting will degrade. It will affect the high clocks first (they demand the highest voltage), which is why dropping the max boost multiplier a step or two can also work around it (at the cost of basically downgrading it to a cheaper processor)


Idle and light load is bad for degradation only because that's the most common scenario where the boosting algorith will actually go to the highest clocks. More loaded cores will have the CPU target lower clocks on all cores so that it actually can get the power for it and have the CPU be coolable, but if you're idle and then some task loads just a single core for a bit the CPU will boost it the highest it can. The voltage spikes from those boosts will cause local hotspots even if the CPU is cool overall


Or perhaps it is more complicated than that

"Even under idle conditions at relatively cool temperatures, sporadic elevated voltages are observed when the processor is resumed from low power states in order to service background operations before entering a low power state again."

https://www.igorslab.de/en/search-for-the-solution-to-raptor...


My 1360p and 13400 seem fine too. I applied the microcode and firmware updates when they came out... but I'm guessing it didn't affect all skus equally for whatever magical reason.


My 13900K was affected by the widespread voltage issue and had to be replaced, but since then I have had zero problems with it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: