> instructions have different lengths also allows extremely long instructions, t...

sterlind · on Nov 30, 2020

What a nightmare, but it makes me wonder: rather than decoding into micro-ops at runtime, could Intel or AMD "JIT" code up-front, in hardware, into a better bytecode?

I'm sure it wouldn't work for everything, but why wouldn't it be feasible to keep a cache of decoding results by page or something?

djcapelis · on Nov 30, 2020

This is exactly how the hardware works and what micro-ops are, on any system with a u-op cache or trace cache those decoded instructions are cached and used instead of decoding again. Unfortunately you still have to decode the instructions at least once first and that bottleneck is the one being discussed here. This is all transparent to the OS and not visible outside a low level instruction cache though, which means you don’t need a major OS change, but arguably if you were willing to take that hit you could go further here.

FeepingCreature · on Nov 30, 2020

So what stops x86 from adding the micro-ops as a dedicated alternate instruction set, Thumb style? Maybe with the implication that Intel will not hold the instruction set stable between chips, pushing vendors to compile to it on the fly?

freemint · on Nov 30, 2020

Mirco Ops are usually much wider than instructions of the ISA. They are usually not multiple of 8 bits wide either.

An dedicated alternative instruction set would be possible but that would take die space and make x86_64+mine even harder to decode.

oseityphelysiol · on Nov 30, 2020

From what I understand this is exactly what the instruction decoder does.

annilt · on Nov 30, 2020

They do something similar for ‘loops’. CPU doesn’t decode same instructions over and over again, just using them from ‘decoded instruction cache’ which has capacity around 1500 bytes.

puzzlingcaptcha · on Nov 30, 2020

Hmm, this reminds me of Transmeta https://en.wikipedia.org/wiki/Transmeta

monocasa · on Dec 1, 2020

They do this in a lot of designs. It's called a micro op cache, or sometimes an L0I cache.