The Mill videos are worth watching again - there are variations on NaT handling and looping and branching etc that make DSPs much more general-purpose.
I don’t know how similar this Electron is, but the Mill explained how it could be done.
I love these videos and his enthusiasm for the problem space. Unfortunately, it seems to me that the progress/ideas have floundered because of concerns around monetizing intellectual property, which is a shame. If he had gone down a more RISC-V like route, I wonder if we would see more real-world prototypes and actual use cases. This type of thing seems great for microprocessor workloads.
It kinda sounds like it, though the article explicitly said it's not VLIW.
I've always felt like itanium was a great idea but came too soon and too poorly executed. It seemed like the majority of the commercial failure came down to friction from switching architecture and the inane pricing rather than the merits of the architecture itself. Basically intel being intel.
I disagree; Itanium was fundamentally flawed for general purpose computing and especially time-shared generally purpose computing. VLIW is not practical in time-sharing systems without completely rethinking the way cache works, and Itanium didn't really do that.
As soon as a system has variable instruction latency, VLIW completely stops working; the entire concept is predicated on the compiler knowing how many cycles each instruction will take to retire ahead of time. With memory access hierarchy and a nondeterministic workload, the system inherently cannot know how many cycles an instruction will take to retire because it doesn't know what tier of memory its data dependencies live in up front.
The advantage of out-of-order execution is that it dynamically adapts to data availability.
This is also why VLIW works well where data availability is _not_ dynamic, for example in DSP applications.
As for this Electron thing, the linked article is too puffed to tell what it's actually doing. The first paragraph says something about "no caches" but the block diagram has a bunch of caches in it. It sort of sounds like an FPGA with bigger primitives (configurable instruction tiles rather than gates), which means that synchronization is going to continue to be the problem and I don't know how they'll solve for variable latency.
Not to detract form your point, but Itanium's design was to address the code compatibility between generations. You could have code optimized for a wider chip run on a narrower chip because of the stop bits.
The compiler still needs to know how to schedule to optimize for a specific microarchitecture but the code would still run albeit not as efficiently.
As an aside, I never looked into the perf numbers but having adjustable register windows while cool probably made for terrible context switching and/or spilling performance.
That's the simplest and most obvious way I can think of. I know the Mill folks were deeply into this space and probably invented something more clever but I haven't kept up with their research in many years.
It does feel maybe like the world has changed a bit now that LLVM is ubiquitous with its intermediate representation form being available for specialized purposes. Translation from IR to a VLIW plan should be easier now than the state of compiler tech in the 90s.
But "this is a good idea just poorly executed" seems to be the perennial curse of VLIW, and how Itanium ended up shoved onto people in the first place.
No, but multiple GPU shader architectures use VLIW instructions. It's totally doable with modern compilers, but it's only adventageous to parallelizable tasks, hence the use in GPUs.