As much as I can understand a Zen 5 CPU core can run two AVX512 operations per c...

kvemkon · 2026-01-06T03:27:42 1767670062

> CPU is limited by its memory bandwidth for streaming tasks

That must be the reason, why EPYC 9175F exists. It is only 16-core CPU, but all 16 8-core CCDs are populated and only one core on each is active.

The next gen EPYC is rumored to have 16 instead of 12 memory channels (which were 8 only 4-5 years ago).

tanelpoder · 2026-01-06T03:53:38 1767671618

This also leaves more power & thermal allowance for the IO Hub on the CPU chip and I guess the CPU is cheaper too.

If your workload is mostly about DMAing large chunks of data around between devices and you still want to examine the chunk/packet headers (but not touch all payload) on the CPU, this could be a good choice. You should have the full PCIe/DRAM bandwidth if all CCDs are active.

Edit: Worth noting that a DMA between PCIe and RAM still goes through the IO Hub (Uncore on Intel) inside the CPU.

vjerancrnjak · 2026-01-06T11:56:44 1767700604

It is interesting that despite this we still have programming languages and libraries that cannot exploit pipelining to actually demonstrate IO is the bottleneck and not CPU