In the linked article it is said that "According to prior research, only some 13...

baking · on March 29, 2014

Offhand, I would say maybe one difference is that your model is trying to predict where the belt will be in the future while the Mill is looking backwards to find where the belt was in the past.

Another issue is that you would have to process the entire instruction in order to know where each operation gets its input. (How many operations in the instruction are taking things off the belt before I get my data?) In the Mill the operations are parsed in parallel and they have all the information they need to start processing as soon as the the instruction (block) is loaded in the buffer.

The size of the belt is a very finely tuned constraint (using simulations) that basically depends on how many cycles you have to save a value to the scratchpad memory (if needed) before it "drops off" the belt. There is a lecture that describes why it takes the number of cycles it does and if you watch it you will probably understand better why the Mill is not about what is easy or hard for the compiler but all about getting the silicon to jump through hoops fast and efficiently.