Those are very different questions... If you want to simply run inference or do ...

Those are very different questions...

If you want to simply run inference or do QLoRA fine tunes of "the best, largest open source models" eg the llama2-70b models, you can do so with 2 x RTX 3090 24GB (~$600 used), so for about $1200 for the GPUs, 48GB of VRAM (set to PL 300W, so 600W while inferencing) - q4 version of llama2-70b take about 38-40GB of memory + kvcache.

If you want 192GB of VRAM, your cheapest realistic option is probably going to be 4 x A6000's (~$16,000) - you will need to have a chassis that will provide adequate power and cooling (1200W for the GPUs). I'd personally suggest that anyone looking to buy that kind of hardware have a fairly good idea of what they're going to use it for beforehand.

I'm not sure what exactly you're asking about with regards to memory, but for workstations, the Xeon W-3400's have 8 channels of DDR5-4800 (the W5-3425 has a $1200 list price) and the upcoming Threadripper Pro 7000s will likely have similar memory support (or you can get an EPYC 9124 for ~$1200 now if you want 12 channels of DDR5).