I think you are overestimating compute and I/O for this model. If you assume it is RAM bandwidth bound, with a single channel top DDR4 you will get inference time as a low multiple of 7 seconds (200GB/25GBs). In a workstation you can have 8 channels.
12-channels in mine. 24-channels on some configurations, though I think that is the upper limit at this time, with a maximum density of 512GB per channel.