> So, typical database operations, massively parallel in nature like join or filter, would run about that faster.
Given workload A how much of the total runtime JOIN or FILTER would take in contrast to the storage engine layer for example? My gut feeling tells me not much since to see the actual gain you'd need to be able to parallelize everything including the storage engine challenges.
IIRC all the startups building databases around GPUs failed to deliver in the last ~10 years. All of them are shut down if I am not mistaken.
With cheap large RAMs and the SSD the storage has already became much less of an issue, especially when the database is primarily in-memory one.
How about attaching SSD based storage to NVLink? :) Nvidia does have the direct to memory tech and uses wide buses, so i don't see any issue for them to direct attach arrays of SSD if they feel like it.
>IIRC all the startups building databases around GPUs failed to deliver in the last ~10 years. All of them are shut down if I am not mistaken.
As i already said - model of database offloading some ops to GPU with its separate memory isn't feasible, and those startups confirmed it. Especially when GPU would be 8-16GB while the main RAM can easily be 1-2TB with 100-200 CPU cores. With 128GB unified memory like on GB10 the situation looks completely different (that Nvidia allows only 2 to be connected by NVLink is just a market segmentation not a real technical limitation).
I mean you wouldn't run a database on a GB10 device or cluster of them thereof. GH200 is another story, however, the potential improvement wrt the databases-in-GPUs still falls short of the question if there are enough workloads that are compute-bound in the substantial part of total wall-clock time for given workload.
In other words, and hypothetically, if you can improve logical plan execution to run 2x faster by rewriting the algorithms to make use of GPU resources but physical plan execution remains to be bottlenecked by the storage engine, then the total sum of gains is negligible.
But I guess there could perhaps be some use-case where this could be proved as a win.
Given workload A how much of the total runtime JOIN or FILTER would take in contrast to the storage engine layer for example? My gut feeling tells me not much since to see the actual gain you'd need to be able to parallelize everything including the storage engine challenges.
IIRC all the startups building databases around GPUs failed to deliver in the last ~10 years. All of them are shut down if I am not mistaken.