Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Upstart retrofits an Nvidia GH200 server into a workstation (theregister.com)
51 points by jjgreen on Feb 16, 2024 | hide | past | favorite | 39 comments


https://gptshop.ai/#info :

  .. Nvidia GH200 Grace Hopper Superchip-powered supercomputer ...

  "What is the difference to alternative systems with the same amount of memory?
  - Compared to 8x Nvidia H100, GH200 costs 5x less, consumes 10x less energy and has roughly the same performance.
  - Compared to 8x Nvidia A100, GH200 costs 3x less, consumes 5x less energy and has a higher performance.
  - Compared to 4x AMD Mi300X, GH200 costs 2x less, consumes 4x less energy and has probably roughly the same performance.
  - Compared to 4x AMD Mi300A (which has only 512 GB memory, more is not possible because the maximum number of scale-up infinity links is 4), GH200 costs significantly less, consumes 3x less energy and has probably a higher performance.
  - Compared to 8x Nvidia RTX A6000 Ada which has significantly less memory (only 384GB), GH200 costs significantly less, consumes 3x less energy and has a higher performance.
  - Compared to 8x AMD Radeon PRO W7900 which has significantly less memory (only 384GB), GH200 costs the same, consumes 3x less energy and has a higher performance."


This is a weird comparison written to make things appear only good for GH200.

There are a bunch of tradeoffs that aren't considered and some of the comparisons don't make sense since GH200 is a cpu+gpu, so comparing against a gpu only, is weird.

There is no such thing as a 4x MI300 chassis, they are all 8x.


Considering the system only has a single H100, why would it be that performant?


Yeah this page is full of straight up lies?

“ Its performance in every regard is almost unreal (up to 284 times faster than x86).”

Like, there are at least 3 things wrong with that statement!


benchmark:

"NVIDIA GH200 CPU Performance Benchmarks Against AMD EPYC Zen 4 & Intel Xeon Emerald Rapids"

* https://www.phoronix.com/review/nvidia-gh200-gptshop-benchma...


"I started experimenting with Nvidia's RTX 4090s. I bought a bunch of them and put them into a mining rack and just ran some tests. I quickly figured out that is not the way to go,"

Well, I hope they were at least smart enough to use PCEI4 x16...otherwise that will never, ever work well. 2x 4090 saturate my PCEI4 x16 bus during training workloads. If they were PCEI3 x1 risers... That's 32x lower inter-gpu bandwidth!

Also, for 50K per GPU, and all this hacking, buy an older 8x A100 system with water-cooling or something.


Really depends on the model and the software tricks you're using. With DDP and gradient accumulation, you can reduce the bandwidth bottleneck by quite a bit. We've trained with 4090s running at x4 lanes with very small impact. And running at x4 means you can stuff up to 26-28GPUs on a single cpu node (say epyc) and get PCIe latency and get rid of networking hassle.


Interesting, I would expect the impact to be noticeable at 4x! and yeah it heavily depends on model, sharding method, model vs data parallel. I’m hitting the peak bandwidth due to a very wide, shallow model that is split between each GPU model parallel and with CPU optimizer offload - so worst case scenario there.

But it does kind of validate Nvidia’s choice to remove nvlink. How useful would it really be if x4 PCIE gets reasonably decent perf? Unless your inner dim is massive or something you should be fine.


Do you have any pictures and/or documentation of that setup, power draw and performance? It sounds pretty interesting!


Never got around to writing some public docs. It's essentially bunch of GPUs on custom aluminum extrusion frames sitting in a server rack, connected to romed8-2t motherboard through pcie splitters.

Power limited to 240w, negligible performance loss while halving energy usage, uses 3 20a circuits.

Performance can range anywhere from 2x4090=1xa100 to 4x4090=1xa100 depending on models, etc.

It's great value for the money, and very easy to resell as well.


Very nice!

240W?

3 x 20A = 6600W?


I meant each card is limited to 240w, instead of the usual 450w. Also, it's more like 4 circuits after all, because the main cpu/mb/2gpus are on a 15a too.


Ah! Ok, thank you now I get it. That's a very nice rig you have there. So at a guess you didn't care as much about the peak computing capacity as long as whatever you are doing all fits in GPU memory and this is your way of collecting that much memory in a single machine so you still have reasonable interconnect speeds between GPUs?


Yeah, it's really just trying to get as much compute as possible as cheaply as possible interconnected in a reasonably fast way with low latency. Slow networking would be a bottleneck and expensive high end networking would defeat the purpose of staying cheap.


You’d be surprised at how cheap high end networking that outperforms PCIE4 x4 is - 100Gb omni-path nics are running for 20$ on ebay! And those will saturate PCIE3 x16.

Though of course with multiple boards/ram/cpu it gets complicated again.


Which cards? Been looking at nics but couldn't find cheap ones past 25-40Gb


Omni-path is/was the Intel fork of Infiniband, which from rough memory they bought from QLogic some years ago.

* Switch: https://www.ebay.com/itm/273064154224

* Adapters: https://www.ebay.com/itm/296188773061 / https://www.ebay.com/itm/166199451199

* Cables: No idea, sorry. ;)

* Description: https://www.youtube.com/watch?v=dOIXtsjJMYE

Note that I don't know those Ebay sellers at all, they're just some of the cheaper results showing up from searching. There seem to be plenty of other results too. :)

---

Possibly useful, though it's Proxmox focused:

https://forum.level1techs.com/t/proxmox-with-intel-omni-path...


Very smart approach. I may copy your setup for some project that I've been working on for years but that stalled waiting for more memory in GPUs.


240W per card probably


Indeed.


Wouldn't an EPYC based motherboard (lots of PCIe x16 slots) generally be the right choice for testing multiple 4090's, rather than a mining rack?


A mining rack would mean the metal frame where the 4090's are mounted, not the motherboard. Since they pulled off this build, We will assume they are smart to use a server motherboard with enough PCIe slots and lanes.


Ahhh. I was thinking they might also have been meaning one of the mining specific motherboards, which commonly use a bunch of PCIe x16 physical slots with only PCIe x1 electrical connections.

This kind of thing:

https://www.asrock.com/mb/amd/x370%20pro%20btc+


$50k is a steal when you factor in the number of compassionately-worded severance letters you can generate without having to use the Cloud


> ... and due to his preference for keeping work out of the cloud

What a German motivation! I too am all for keeping things under your control and this is certainly a very cool exercise ... but I don't quite see why you can't just ... use the server as a server. Then you can connect to it using any portable device.

Really, what's the point of "workstations" nowadays, at least for this type of application?


Or: I could just put the OEM version into a rack in another room. Why does it need to be on my desk? It’s not like this is some kind of graphics board that needs to sit a short distance from my monitor.


A lot of people in Europe don’t have an extra room to put it in. Me neither, signed US metro area dweller.


If you were really serious about this, I would think throwing the server into a colocation center would be the way to go. I believe those costs can be as low as a few hundred a month. Security, power, and cooling no longer your problem.


Definitely. The amps are the expensive part, but if you have consistently high load, moving from residential rates might mean you're only paying $100-200 for all those other advantages.


If you're dropping $50k on a workstation you are either a business or in the upper tier of renters/homeowners. Nobody who can afford this is living in a closet out of necessity.


50k to drop on a server? You can afford the space or colo.


> It’s not like this is some kind of graphics board that needs to sit a short distance from my monitor.

I wouldn't be surprised if some of the buyers would consider it as gaming rig with maximum bragging rights.


It's not a video card, it does not have video outputs.


Yeah for $51,000. I will happy run a rack of 4090s.


Maybe ok in your house, but if you want to scale out...

https://www.nvidia.com/en-us/drivers/geforce-license/

No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.


€50 000? In 10 years I will buy it for €5000.


Someone call Linus Tech Tips right now!!!


> Of course it's bristling with Noctuas – how else do you cool a 1kW desktop?

Is this implying that noctuas are good at cooling?


Yes?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: