Expandable VRAM on GPUs has been tried before - the industry just hates it. It's...

Majromax · on May 13, 2024

Expansion has to answer one fundamental question: if you're likely to need more X tomorrow, why aren't you just buying it today?

The answer to this question almost has to be "because it will be cheaper to buy it tomorrow." However, GPUs bundle together RAM and compute. If RAM is likely to be cheaper tomorrow, isn't compute also probably going to be cheaper?

If both RAM and compute are likely cheaper tomorrow, then the calculus still probably points towards a wholesale replacement. Why not run/train models twice as quickly alongside the RAM upgrades?

> I strongly suspect the desire to keep buyers upgrading the whole card ($$$) every few years trumps this massively if you are a GPU vendor.

Remember as well that expandable RAM doesn't unlock higher-bandwidth interconnects. If you could take the card from five years ago and load it up with 80 GB of VRAM, you'd still not see the memory bandwidth of a newly-bought H100.

If instead you just need the VRAM and don't care much about bandwidth/latency, then it seems like you'd be better off using unified memory and having system RAM be the ultimate expansion.

AnthonyMouse · on May 13, 2024

> The answer to this question almost has to be "because it will be cheaper to buy it tomorrow."

No, it doesn't. It could just as easily be "because I will have more money tomorrow." If faster compute is $300 and more VRAM is $200 and I have $300 today and will have another $200 two years from now, I might very well like to buy the $300 compute unit and enjoy the faster compute for two years before I buy the extra VRAM, instead of waiting until I have $500 to buy both together.

But for something which is already a modular component like a GPU it's mostly irrelevant. If you have $300 now then you buy the $300 GPU, then in two years when you have another $200 you sell the one you have for $200 and buy the one that costs $400, which is the same one that cost $500 two years ago.

This is a much different situation than fully integrated systems because the latter have components that lose value at different rates, or that make sense to upgrade separately. You buy a $1000 tablet and then the battery goes flat and it doesn't have enough RAM, so you want to replace the battery and upgrade the RAM, but you can't. The battery is proprietary and discontinued and the RAM is soldered. So now even though that machine has a satisfactory CPU, storage, chassis, screen and power supply, which is still $700 worth of components, the machine is only worth $150 because nothing is modular and nobody wants it because it doesn't have enough RAM and the battery dies after 10 minutes.

hellofellows · on May 13, 2024

hmm seems you're replying as a customer, but not as a GPU vendor...

the thing is, there's not enough competition in the AI-GPU space.

Current only option for no-wasting-time on running some random research project from github? buy some card from nvidia. cuda can run almost anything on github.

AMD gpu cards? that really depends...

and gamers often don't need more than 12?gb of GPU ram for running games on 4k.. so most high-vram customers are on the AI field.

> If you could take the card from five years ago and load it up with 80 GB of VRAM, you'd still not see the memory bandwidth of a newly-bought H100.

this is exactly what nvidia will fight against tooth-and-nail -- if this is possible, its profit margin could be slashed to 1/2 or even 1/8