Maybe EPYC can make better use of the available bandwidth, but for comparison I ...

		Gracana 5 months ago \| parent \| context \| favorite \| on: Run DeepSeek R1 Dynamic 1.58-bit Maybe EPYC can make better use of the available bandwidth, but for comparison I have a water cooled Xeon W5-3435X running at 4.7GHz all-core with 8 channels of DDR5-6400, and CPU inference is still dog slow. With a 70B Q8 model I get 1 tok/s, which is a lot less than I thought I would get with 410GB/s max RAM bandwidth. If I run on 5x A4000s I get 6.1 tok/s, which makes sense... 448GB/s / 70GB = 6.4 tok/s max.

very strange as I get on old i5-12400+DDR4 2 tok/sec with 14B/q8 model.