> most of the compute time spent is in serializing and deserializing data. This ...

outworlder · on Feb 3, 2021

> What you want is to have the CPU coordinate the SSD to copy chunks directly -and as is- to the NIC/app/etc.

Isn't that what DMA is supposed to be?

Also, there's work in getting GPUs to load data straight from NVME drives, bypassing both the CPU and system memory. So you could certainly do similar things with the PCIE bus.

https://developer.nvidia.com/blog/gpudirect-storage/

A big problem is that a lot of data isn't laid out in a way that's ready to be stuffed in memory. When you see a game spending a long time loading data, that's usually why. The CPU will do a bunch of processing to map on disk data structures to a more efficient memory representation.

If you can improve the on-disk representation to more closely match what's in memory, then CPUs are generally more than fast enough to copy bytes around. They are definitely faster than system RAM.

jedbrown · on Feb 3, 2021

This is backward -- this sort of serialization is overwhelmingly bottlenecked on bandwidth (not CPU). (Multi-core) compute improvements have been outpacing bandwidth improvements for decades and have not stopped. Serialization is a bottleneck because compute is fast/cheap and bandwidth is precious. This is also reflected in the relative energy to move bytes being increasingly larger than the energy to do some arithmetic on those bytes.

jeffbee · on Feb 3, 2021

An interesting perspective on the future of computer architecture but it doesn't align well with my experience. CPUs are easier to build and although a lot of ink has been spilled about the end of Moore's Law, it remains the case that we are still on Moore's curve for number of transistors, and since about 15 years ago we are now also on the same slope for # of cores per CPU. We also still enjoy increasing single-thread performance, even if not at the rates of past innovation.

DRAM, by contrast, is currently stuck. We need materials science breakthroughs to get beyond the capacitor aspect ratio challenge. RAM is still cheap but as a systems architect you should get used to the idea that the amount of DRAM per core will fall in the future, by amounts that might surprise you.