In the linked article it is said that "According to prior research, only some 13% of values are used more than once". So based on the research on which Mill is based on, your example and that "instruction can use any result on the belt" is actually minority case.
As for scheduling my proposed store-addressed belt: you perform as many operations in parallel, then for each operation find the other operation that depends on the former's operation result, calculate the distance between them and assign it as the former's store address. The compiler has more work to do yes, but not "much more difficult".
Offhand, I would say maybe one difference is that your model is trying to predict where the belt will be in the future while the Mill is looking backwards to find where the belt was in the past.
Another issue is that you would have to process the entire instruction in order to know where each operation gets its input. (How many operations in the instruction are taking things off the belt before I get my data?) In the Mill the operations are parsed in parallel and they have all the information they need to start processing as soon as the the instruction (block) is loaded in the buffer.
The size of the belt is a very finely tuned constraint (using simulations) that basically depends on how many cycles you have to save a value to the scratchpad memory (if needed) before it "drops off" the belt. There is a lecture that describes why it takes the number of cycles it does and if you watch it you will probably understand better why the Mill is not about what is easy or hard for the compiler but all about getting the silicon to jump through hoops fast and efficiently.
As for scheduling my proposed store-addressed belt: you perform as many operations in parallel, then for each operation find the other operation that depends on the former's operation result, calculate the distance between them and assign it as the former's store address. The compiler has more work to do yes, but not "much more difficult".