Maybe EPYC can make better use of the available bandwidth, but for comparison I have a water cooled Xeon W5-3435X running at 4.7GHz all-core with 8 channels of DDR5-6400, and CPU inference is still dog slow. With a 70B Q8 model I get 1 tok/s, which is a lot less than I thought I would get with 410GB/s max RAM bandwidth. If I run on 5x A4000s I get 6.1 tok/s, which makes sense... 448GB/s / 70GB = 6.4 tok/s max.