Jim Keller is bullish on chiplets and it's not just about yield, it's also about power efficiency and the economics of only using the most expensive nodes (eg. 3nm) for critical parts and being able to combine it lego-style with IP fabricated on cheaper nodes.
That’s interesting, I’m curious about the argument for power efficiency (it seems like breaking a design up into chiplets would only hurt there, but I’m no Jim Keller!). I’ll definitely check the video out when I get a chance.
Lego-ing out IP blocks could be really cool. It would be neat if this gave OEMs room to compete in a more technical field (selling Xeons in Dell cases vs Xeons in Lenovo cases is not very cool, if Dell and Lenovo could pick the parts in their package, that could be interesting).
I think this may be where some manufacturers are heading, but not necessarily the big "consumer" names like Intel/AMD. It's a way to significantly lower the risk of building an SoC, and I think it's more plausible that this sort of third-party chiplet market will be driven by the Marvell/Broadcom types rather than Intel/AMD.
There was something impressive with tenstorrent original architecture (haven't been following the talk about pivoting to risc-v) was the low cost of validation. It seemed like the idea was 'cheapest to tape-out with max FLOPS' and it somehow worked out well. Can't say more, I couldn't ever get my hands on one of them Grayskull so...
Quietly, digital ASIC validation has gotten very cheap and quick over the last 20 years thanks to the proliferation of software models and compilers that translate Verilog to C++ (which are literally orders of magnitude faster than previous simulators). I'm not surprised that Tenstorrent has been doing well with that, given the expertise they have.
I am not sure where power efficiency came from, both AMD and Intel along with TSMC were very clear it was not about power efficiency, whether that is the current AMD Chiplet or CoWoS ( TSMC ) / EMIB ( Intel ).
The linked video above makes his assertion pretty clear. If you can move items that were previously off-chip onto a chiplet, you've shortened the interconnect traces and can increase speed and reduce power consumption. In his particular case, the hope is to combine CPU, AI, and IO chiplets into the same package.
In a way, these points are roughly how modern computers work.
For instance, let's say you bought a new PC in 2017 with a high end graphics card. Your CPU was probably manufactured on 14nm processes, your GPU on 16nm, your RAM from 16 to 20nm, and your motherboard chipset on 22nm. It's been a similar story throughout computing.
Taking that idea and scaling it down to processor designs themselves is cool and no doubt faces hard technical challenges.
AMD has used the same IO die across multiple generations of Zen hardware. This cuts the dev and validation costs for using new processes.
Fungibility helps reduce the cost of making SKUs. Want more cores? Add more compute dies. Your customers want more big chips then expected? It's easier/faster to adapt production since only the final steps differ. The component pieces are the same.
Except the cost of advance packaging isn't cheap. So while you save on validation , yield and multi use of component you are trading those with much higher cost of packaging. Although CoWoS being a must have for all future HPC AI requirement I assume we could drive down the cost relatively quickly. But I still have doubt in certain market segment whether this make sense, especially those in the lower end market.
https://youtu.be/c_Wj_rNln-s?t=127