I know next to nothing about processors at this level, but I wonder if it would be possible for a skilled engineer to try to find these instructions by scrutinizing the actual physical instruction decoder on the chip and/or inspect the processor's microcode. Are these things possible to do? If they are, is it feasible to reverse engineer them?
A team from Ruhr University Bochum reverse-engineered the microcode for the AMD K8 and K10 to implement custom microcode programs, they describe how they reverse-engineered the ROM here: https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentl.... The problem with reverse-engineering on newer CPUs is that on both Intel and AMD the updates are now protected with cryptographic authentication, so you can't run arbitrary custom microcode to aid in understanding what it does. And as others have mentioned, the hardware complexity and small feature sizes make reverse-engineering the microcode engine or ROM by physical inspection much harder than on the K8. I expect it will be achieved at some point, though.
Ryzenfall and Masterkey together provide a complete break of the PSP authentication with arbitrary code execution if you are running a vulnerable AGESA.
You will of course have to flash downwards to a vulnerable version but it should give you what you need. Also, as this was fixed prior to Zen2 then you wouldn’t be able to test their newer chips.
Basically AMD has a buffer overflow in a SMM call and they conveniently signed an empty PSP module payload that will be executed nevertheless.
Yeah, that's right - you could prove the absence of malicious microcode, but not the absence of hardware trojans, implants, etc. There are also the embedded microcontrollers to deal with (e.g. the Management Engine, Innovation Engine, and Power Management Controller on Intel).
transistors are placed by software, not humans anymore
If anything I'd say that makes it even easier to reverse-engineer, since the layout is far more regular. There are some public tools to do this already - here's one that immediately comes to mind: https://degate.org/
This was also linked elsewhere but a security researcher was able to identify a significant number of undocumented instructions: https://www.youtube.com/watch?v=KrksBdWcZgQ
What kind of complexity are we looking at, roughly? Surely someone with deep pockets and the necessary expertise would be interested in trying to find these kinds of things, no?
A modern CPU has a transistor count in the billions/low tens of billions. I haven't really thought about it but I'm tempted to say that looking at the decoder stage(s) alone won't do. Undocumented operation doesn't have to be in the form of an entire undocumented instruction. You could design the device so that the "right thing" would happen simply by scheduling the right instruction, with the right arguments, under the right conditions (the right execution unit, the right amount of pipeline clog etc.). The whole thing is significantly more complex than the "fetch-decode-execute" diagrams would have you believe -- execution isn't strictly sequential, executing exactly the same instruction won't cause the exact same transistors to "fire" each time etc..
So the level of complexity is pretty daunting. IMHO if you want to find out undocumented behaviour that was deliberately introduced, you're better off looking at other methods, no matter how deep your pockets are.
Certainly. But even those can be routed so that the WR signal for a particular address also doubles as half of the AND input which causes a read from another range to always return zero, for example. (That's not an "undocumented opcode", of course, but it can be used maliciously). It's certainly not easy to do this kind of meaningful obfuscation though, especially between different blocks, since different blocks are usually in different clock domains, too.
Edit: sorry, my neurons got all jumbled and I was thinking of a far more general case, i.e. undocumented behaviour in general, not just undocumented instructions. Indeed, only a relatively small subset of all these transistors is relevant in terms of undocumented instructions specifically.
I believe the first "page" of opcodes (i.e. 1-byte opcodes, the ones that don't start with 0F) has already been extensively researched and documented, at least in 16 and 32-bit mode; the interesting things are all in the "second page", the ones that begin with 0F and are relatively new instructions, and the awkward and somewhat inconsistent way in which 64-bit mode was implemented.
Also, the fact that they're trying to test undocumented behaviour from within a full OS was a bit unexpected; in the retrocomputing community, where CPUs like the Z80 and 6502 have been studied extensively, the usual way of testing undocumented behaviour is to boot into a very minimal environment whose only purpose is test that behaviour, so as to eliminate any other variables from the process. Logic analysers/bus monitoring are also used sometimes, although that might be harder with a modern high-speed CPU.
Due to things like dynamic logic and PLLs (https://en.wikipedia.org/wiki/Phase-locked_loop ), modern CPUs can't clock down into the tens of MHz range or lower. There's also the issue of things like DRAM refresh.
Author here - the main reason there's no examples is because I didn't have any interesting ones to report at the time! I was trying to develop new detection methods, but found only the (thousands) of undocumented software prefetches which were previously reported by Domas in his Sandsifter project, e.g. 0f 0d /2 and /3-7 on Intel CPUs (these are documented by AMD, but not Intel, and opcode behavior varies more often between the two than you'd expect). Many of the interesting undocumented x86 opcodes (e.g. icebp, salc, loadall) were either only present in older CPUs or are now at least partially documented. There are some much more interesting undocumented opcodes on other architectures (which have architectural effects, e.g. changing register values, halting the CPU), but that's still an ongoing project.
Edit: 0f 0d /2 is documented as prefetchwt1 but (allegedly) unsupported by the CPUs I tested it on, so the fact it executes at all is undocumented.
Author here - the aim of this project was to explore exactly why such opcodes are problematic for security. Even if they're implemented with entirely innocent intentions - e.g. for debug+verification purposes - they can lead to vulnerabilities in operating systems, emulators, and hypervisors. They induce edge cases which developers can't protect against if they don't know they exist in the first place (due to the lack of any public documentation). There's a more thorough writeup of the project here: https://github.com/cattius/opcodetester/blob/master/thesis.p....
I recall some years ago seeing a post on the OpenBSD mailing list about Intel chip errata and thinking: "I love Big Brother, and Big Brother loves loving me."
Just thinking aloud (not the only one, obviously).
So is this the combined result of market mechanics? Intel being leader their top priority was to release the fastest chips at all costs, letting security/simplicity/sustainability behind? On top of that, complexity becomed another barrier to competitors. This feels insane, unsustainable.
Whole parts of the industry have already switched to alternative architectures. MIPS was prevalent in set-top box, then replaced with ARM in the 2010s. ARM reigns on most mobile devices. Risc-V is on the rise.
Areas craving for performance without concern about power consumption or security still run on Intel. For how long?
Supercomputers get 10x more power from GPUs than CPUs, switching to an alternative may come.
Could we imagine the gamer market switching to nVidia on ARM/RISC?
When I learned to program (before 9/11) there was a big emphasis on assembly language and using low level interfaces to communicate with other hardware. The idea was that everyone studying computer science should understand every aspect of the CPU down to the register and operation level, and then be able to design logic gates to replicate that functionality if needed.
Now we have CPUs that are fundamentally undocumented, unknowable, and untranslatable. The entire infrastructure of the network, the telecoms, and the cpu design itself has all been subverted to the needs of the national security complex or corporate advertising.
I'm not sure what computer science even means anymore. Everything I learned is completely useless.
It's not useless. FPGAs have plummeted in cost and there are now open-source toolchains for some of them. There's also a Free commercial-grade ISA that you can use in your personal designs (RISC-V). These days, it is not expensive to design your own well-understood computer which can run microcode generated by commercial-grade compilation toolchains such as GCC. Even hardware production is getting cheapER with shared wafer runs like MOSIS, although custom silicon is still out of reach for hobbyists.
Chin up, buddy. The US is not the entire world, and the pendulum of our generations' zeitgeist can still swing back towards the ideals of liberty and equality of access which the mavens of computing once stood for. You can already buy ARM application processors from vendors other than Intel/AMD, and I would be surprised if we lived in a world where every new computer comes with "management engine" spyware in its CPU for much longer.
I love your optimism, I’m not sure if I can see a path towards the public voting for a government that would make the necessary adjustments to reign in the ability of government powers to influence “management engine” code.
X86 was designed way back even before pre-9/11, so "now" is not any different from the past. We all know the rep X86 gets for poor documentation - poor design in general. Claiming that older ISA's were better documented and easier to understand will not get you very far.
Most if not all top computer science / computer engineering programs in the states teach digital logic design, x86 / x86-64, computer architecture, compilers, communication networks in very fundamental detail as required courses. The emphasis is still there.
I know next to nothing about processors at this level, but I wonder if it would be possible for a skilled engineer to try to find these instructions by scrutinizing the actual physical instruction decoder on the chip and/or inspect the processor's microcode. Are these things possible to do? If they are, is it feasible to reverse engineer them?