Undocumented CPU behavior: analyzing undocumented opcodes on Intel x86-64 (2018) [pdf]

saagarjha · on March 8, 2020

Associated GitHub repository with more information: https://github.com/cattius/opcodetester

I know next to nothing about processors at this level, but I wonder if it would be possible for a skilled engineer to try to find these instructions by scrutinizing the actual physical instruction decoder on the chip and/or inspect the processor's microcode. Are these things possible to do? If they are, is it feasible to reverse engineer them?

cat_easdon · on March 8, 2020

A team from Ruhr University Bochum reverse-engineered the microcode for the AMD K8 and K10 to implement custom microcode programs, they describe how they reverse-engineered the ROM here: https://www.syssec.ruhr-uni-bochum.de/media/emma/veroeffentl.... The problem with reverse-engineering on newer CPUs is that on both Intel and AMD the updates are now protected with cryptographic authentication, so you can't run arbitrary custom microcode to aid in understanding what it does. And as others have mentioned, the hardware complexity and small feature sizes make reverse-engineering the microcode engine or ROM by physical inspection much harder than on the K8. I expect it will be achieved at some point, though.

paulmd · on March 9, 2020

Ryzenfall and Masterkey together provide a complete break of the PSP authentication with arbitrary code execution if you are running a vulnerable AGESA.

You will of course have to flash downwards to a vulnerable version but it should give you what you need. Also, as this was fixed prior to Zen2 then you wouldn’t be able to test their newer chips.

Basically AMD has a buffer overflow in a SMM call and they conveniently signed an empty PSP module payload that will be executed nevertheless.

https://youtu.be/QuqefIZrRWc

jfkebwjsbx · on March 8, 2020

Even if you manage to understand the microcode, there could be custom behavior in the hardware logic, right?

So I would guess studying the microcode does not lead to proving there is nothing else going on.

I am no EE, though.

cat_easdon · on March 8, 2020

Yeah, that's right - you could prove the absence of malicious microcode, but not the absence of hardware trojans, implants, etc. There are also the embedded microcontrollers to deal with (e.g. the Management Engine, Innovation Engine, and Power Management Controller on Intel).

CodeArtisan · on March 8, 2020

Yes it is, see https://arstechnica.com/gaming/2017/07/mame-devs-are-crackin... or http://www.visual6502.org/

but next CPUs will have vertically stacked circuitry making reverse engineering much more harder (impossible?).

https://en.wikipedia.org/wiki/Three-dimensional_integrated_c...

https://en.wikichip.org/wiki/intel/foveros

yjftsjthsd-h · on March 8, 2020

Why would stacking layers make it impossible? The process will be destructive, but I would expect it to work.

plutonorm · on March 8, 2020

How would you ensure you only etched away the layers you were interested in?

userbinator · on March 8, 2020

It's already possible to delayer pretty accurately --- see the visual6502 project linked above.

jfkebwjsbx · on March 8, 2020

Theoretically possible, practically impossible.

For two reasons: the equipment needed (extremely expensive) and the complexity of the task (transistors are placed by software, not humans anymore).

userbinator · on March 8, 2020

transistors are placed by software, not humans anymore

If anything I'd say that makes it even easier to reverse-engineer, since the layout is far more regular. There are some public tools to do this already - here's one that immediately comes to mind: https://degate.org/

caleb-allen · on March 9, 2020

This was also linked elsewhere but a security researcher was able to identify a significant number of undocumented instructions: https://www.youtube.com/watch?v=KrksBdWcZgQ

saagarjha · on March 8, 2020

What kind of complexity are we looking at, roughly? Surely someone with deep pockets and the necessary expertise would be interested in trying to find these kinds of things, no?

alxlaz · on March 8, 2020

A modern CPU has a transistor count in the billions/low tens of billions. I haven't really thought about it but I'm tempted to say that looking at the decoder stage(s) alone won't do. Undocumented operation doesn't have to be in the form of an entire undocumented instruction. You could design the device so that the "right thing" would happen simply by scheduling the right instruction, with the right arguments, under the right conditions (the right execution unit, the right amount of pipeline clog etc.). The whole thing is significantly more complex than the "fetch-decode-execute" diagrams would have you believe -- execution isn't strictly sequential, executing exactly the same instruction won't cause the exact same transistors to "fire" each time etc..

So the level of complexity is pretty daunting. IMHO if you want to find out undocumented behaviour that was deliberately introduced, you're better off looking at other methods, no matter how deep your pockets are.

dmitrygr · on March 8, 2020

> A modern CPU has a transistor count in the billions/low tens of billions

A large percentage of that is simply 6T/cell SRAM L1/L2/etc cache though

alxlaz · on March 8, 2020

Certainly. But even those can be routed so that the WR signal for a particular address also doubles as half of the AND input which causes a read from another range to always return zero, for example. (That's not an "undocumented opcode", of course, but it can be used maliciously). It's certainly not easy to do this kind of meaningful obfuscation though, especially between different blocks, since different blocks are usually in different clock domains, too.

Edit: sorry, my neurons got all jumbled and I was thinking of a far more general case, i.e. undocumented behaviour in general, not just undocumented instructions. Indeed, only a relatively small subset of all these transistors is relevant in terms of undocumented instructions specifically.

jfkebwjsbx · on March 8, 2020

I am no EE, but even if you had the tools it is harder than reverse engineering any software.

At least with software you can go step by step and inspect the memory at the very least.

With hardware you would have to take into account the entire chip state and routing.

kchoudhu · on March 8, 2020

Related talk about this from Blackhat a few years ago:

https://www.youtube.com/watch?v=KrksBdWcZgQ

anon73044 · on March 8, 2020

Unfortunately Chris works for Intel now so I don't think he'll be giving any more of these talks in the future. (At least until his NDA expires)

kchoudhu · on March 9, 2020

All good things get eaten by the majors eventually, it would seem.

userbinator · on March 8, 2020

I believe the first "page" of opcodes (i.e. 1-byte opcodes, the ones that don't start with 0F) has already been extensively researched and documented, at least in 16 and 32-bit mode; the interesting things are all in the "second page", the ones that begin with 0F and are relatively new instructions, and the awkward and somewhat inconsistent way in which 64-bit mode was implemented.

Also, the fact that they're trying to test undocumented behaviour from within a full OS was a bit unexpected; in the retrocomputing community, where CPUs like the Z80 and 6502 have been studied extensively, the usual way of testing undocumented behaviour is to boot into a very minimal environment whose only purpose is test that behaviour, so as to eliminate any other variables from the process. Logic analysers/bus monitoring are also used sometimes, although that might be harder with a modern high-speed CPU.

s_gourichon · on March 8, 2020

"High speed" shouldn't be a concern, should it? By adjusting the clocks I believe you can run the CPU as slow as you wish.

Complexity and the ratio of visible behavior over unobservable state is astronomically worse than for a 8bit CPU and therefore a concern, still.

userbinator · on March 9, 2020

Due to things like dynamic logic and PLLs (https://en.wikipedia.org/wiki/Phase-locked_loop ), modern CPUs can't clock down into the tens of MHz range or lower. There's also the issue of things like DRAM refresh.

kken · on March 8, 2020

It would actually be interesting to see examples of actual undocumented opcodrles. There are none in the linked article.

guidedlight · on March 8, 2020

But then they would be documented.

Think McFly!

kken · on March 8, 2020

Think more! One would guess that the search for undocumented opcodes yields results...?

cat_easdon · on March 8, 2020

Author here - the main reason there's no examples is because I didn't have any interesting ones to report at the time! I was trying to develop new detection methods, but found only the (thousands) of undocumented software prefetches which were previously reported by Domas in his Sandsifter project, e.g. 0f 0d /2 and /3-7 on Intel CPUs (these are documented by AMD, but not Intel, and opcode behavior varies more often between the two than you'd expect). Many of the interesting undocumented x86 opcodes (e.g. icebp, salc, loadall) were either only present in older CPUs or are now at least partially documented. There are some much more interesting undocumented opcodes on other architectures (which have architectural effects, e.g. changing register values, halting the CPU), but that's still an ongoing project.

Edit: 0f 0d /2 is documented as prefetchwt1 but (allegedly) unsupported by the CPUs I tested it on, so the fact it executes at all is undocumented.

smitty1e · on March 8, 2020

Undocumented for you doesn't remove the possibility that someone, somewhere, has a firm grasp of what that opcode does, and why.

cat_easdon · on March 8, 2020

Author here - the aim of this project was to explore exactly why such opcodes are problematic for security. Even if they're implemented with entirely innocent intentions - e.g. for debug+verification purposes - they can lead to vulnerabilities in operating systems, emulators, and hypervisors. They induce edge cases which developers can't protect against if they don't know they exist in the first place (due to the lack of any public documentation). There's a more thorough writeup of the project here: https://github.com/cattius/opcodetester/blob/master/thesis.p....

smitty1e · on March 8, 2020

I recall some years ago seeing a post on the OpenBSD mailing list about Intel chip errata and thinking: "I love Big Brother, and Big Brother loves loving me."

floatingatoll · on March 8, 2020

The final thesis behind this slide deck is alongside it in the repo: https://github.com/cattius/opcodetester/

h2odragon · on March 8, 2020

https://en.wikipedia.org/wiki/LOADALL

You could almost emulate a MMU on a 286

s_gourichon · on March 8, 2020

Just thinking aloud (not the only one, obviously).

So is this the combined result of market mechanics? Intel being leader their top priority was to release the fastest chips at all costs, letting security/simplicity/sustainability behind? On top of that, complexity becomed another barrier to competitors. This feels insane, unsustainable.

Whole parts of the industry have already switched to alternative architectures. MIPS was prevalent in set-top box, then replaced with ARM in the 2010s. ARM reigns on most mobile devices. Risc-V is on the rise.

Areas craving for performance without concern about power consumption or security still run on Intel. For how long?

Supercomputers get 10x more power from GPUs than CPUs, switching to an alternative may come.

Could we imagine the gamer market switching to nVidia on ARM/RISC?

It Intel architecture a huge sinking ship?

s_gourichon · on March 10, 2020

Well... For the motivation "New flaw in Intel chips lets attackers slip their own data into secure enclave" https://news.ycombinator.com/item?id=22537216 and for the predicted outcome "ARM-ed Mac: Not Again Or For Real This Time?" https://mondaynote.com/arm-ed-mac-not-again-or-for-real-this...

Intel bashing becomes much too easy.

guerrilla · on March 8, 2020

Will video for this be uploaded?

cat_easdon · on March 8, 2020

Sorry - the presentation was never recorded. There's more information in the GitHub repo, however: https://github.com/cattius/opcodetester.

Jahak · on March 8, 2020

nice document

stopads · on March 8, 2020

When I learned to program (before 9/11) there was a big emphasis on assembly language and using low level interfaces to communicate with other hardware. The idea was that everyone studying computer science should understand every aspect of the CPU down to the register and operation level, and then be able to design logic gates to replicate that functionality if needed.

Now we have CPUs that are fundamentally undocumented, unknowable, and untranslatable. The entire infrastructure of the network, the telecoms, and the cpu design itself has all been subverted to the needs of the national security complex or corporate advertising.

I'm not sure what computer science even means anymore. Everything I learned is completely useless.

leggomylibro · on March 8, 2020

I'm with you up to the last paragraph.

It's not useless. FPGAs have plummeted in cost and there are now open-source toolchains for some of them. There's also a Free commercial-grade ISA that you can use in your personal designs (RISC-V). These days, it is not expensive to design your own well-understood computer which can run microcode generated by commercial-grade compilation toolchains such as GCC. Even hardware production is getting cheapER with shared wafer runs like MOSIS, although custom silicon is still out of reach for hobbyists.

Chin up, buddy. The US is not the entire world, and the pendulum of our generations' zeitgeist can still swing back towards the ideals of liberty and equality of access which the mavens of computing once stood for. You can already buy ARM application processors from vendors other than Intel/AMD, and I would be surprised if we lived in a world where every new computer comes with "management engine" spyware in its CPU for much longer.

zadokshi · on March 8, 2020

I love your optimism, I’m not sure if I can see a path towards the public voting for a government that would make the necessary adjustments to reign in the ability of government powers to influence “management engine” code.

tmotwu · on March 8, 2020

X86 was designed way back even before pre-9/11, so "now" is not any different from the past. We all know the rep X86 gets for poor documentation - poor design in general. Claiming that older ISA's were better documented and easier to understand will not get you very far.

Most if not all top computer science / computer engineering programs in the states teach digital logic design, x86 / x86-64, computer architecture, compilers, communication networks in very fundamental detail as required courses. The emphasis is still there.

dfox · on March 8, 2020

The set of ~8 undocumented but well-known i386 instructions predates not only 9/11 but even 486.