The key of the Dietz system (MOG) is that the native code that the GPU runs is a bytecode interpreter. Bytecode "instruction pointer" together with other data is just data in registers and memory that's interpreted by the native code interpreter. So for each thread, the instruction pointer can point at a different command - the interpreter runs the same instructions but the results are different. So effectively you are simulating a general purpose CPU running a different instruction on each thread. There are further tricks required to make this efficient, of course. But you are effectively running a different general purpose instruction per thread (actually runs MIPS assembler I recall).
This is more or less what I'm talking about. I wonder what possibilities lie with using the huge numerical computation available on a GPU applied to predictive parts of a CPU, such as memory prefetch prediction, branch prediction, etc.
Not totally dissimilar to the thinking behind NetBurst which seemed to be all about having a deep pipeline and keeping it fed with quality predictions.
I'm not sure if your idea in particular is possible but who knows. There may be fundamental limits to speeding up computation based speculative look-ahead not matter how many parallel tracks you have and it may run into memory through-put issues.
But take a look at the MOG code and see what you can do.