Hi, Thanks for the extensive reply - a lot to digest and reflect on!
First of all I think a broadly agree with the direction of your argument. In the early 2000s the decision was made to focus on single core performance and SIMD extensions rather than embrace a massively multicore future. I guess Intel got burned by Itanium and decided that 100% compatibility with existing software was essential.
I think that road has run out now. Single core performance improvements have slowed and big SIMD is dying (hello AVX 512!). Desktop core counts are stuck but on the server you can use 128 core EC2 instances. How long before this appears in a box on your desk?
Massively multicore GPUs have taken over ML but having tried to use GPUs for general purpose computing there are huge issues - eg the overhead in transferring data and limited GPU memory sizes. The good news is that if you use the right tools you can use say OpenCL and write to run on both CPU and GPU and take advantage of increasing core counts on both.
So I think we’re on the cusp of a change: much higher CPU core counts and developers having the tools to make use of those cores.
A couple of ps
It will be interesting to see whether someone tries putting lots of simple in order cores on a single die (I think there are early RISCV attempts at this).
The transputer in the 1980s was an early experiment in massively multicore CPU systems.
The Arm team knew early on that memory bandwidth was key and focused on that with the Arm1 (and were rejected by Intel when they asked for a higher bandwidth x86 core). The rest is history!
First of all I think a broadly agree with the direction of your argument. In the early 2000s the decision was made to focus on single core performance and SIMD extensions rather than embrace a massively multicore future. I guess Intel got burned by Itanium and decided that 100% compatibility with existing software was essential.
I think that road has run out now. Single core performance improvements have slowed and big SIMD is dying (hello AVX 512!). Desktop core counts are stuck but on the server you can use 128 core EC2 instances. How long before this appears in a box on your desk?
Massively multicore GPUs have taken over ML but having tried to use GPUs for general purpose computing there are huge issues - eg the overhead in transferring data and limited GPU memory sizes. The good news is that if you use the right tools you can use say OpenCL and write to run on both CPU and GPU and take advantage of increasing core counts on both.
So I think we’re on the cusp of a change: much higher CPU core counts and developers having the tools to make use of those cores.
A couple of ps
It will be interesting to see whether someone tries putting lots of simple in order cores on a single die (I think there are early RISCV attempts at this).
The transputer in the 1980s was an early experiment in massively multicore CPU systems.
The Arm team knew early on that memory bandwidth was key and focused on that with the Arm1 (and were rejected by Intel when they asked for a higher bandwidth x86 core). The rest is history!