I don't for a minute believe Deepseek v3 was built with a $6M rental. Their pape...

segmondy · 2025-01-28T13:58:41 1738072721

They pointed out that the cost calculation is based on if those GPUs were rented at $2/hr. They are not factoring in the prior cost of buying those H800s because they didn't buy it to build R1. They are not factoring in the cost to build v2, or v2.5. The cost is to build V3. The cost to build R0 and R1 on top of v3, seems far cheaper and they didn't mention that. They are not factoring in the cost to build out their datacenter or salary. Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.

"Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."

V3 was released a bit a month ago, V3 is not what took the world by storm but R1. The price everyone is talking about is the price for V3.

ecocentrik · 2025-01-28T14:12:07 1738073527

If this weren't an attempt to sell a false equivalency, at least one story would have details on the equivalent rental cost of compute used to train closed source frontier models from OpenAI, Anthropic, Mistral... Lack of clarity makes it a story.

kamaal · 2025-01-28T14:54:34 1738076074

>>Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.

This is still quite impressive, given most people are likely to buy cloud infrastructure from AWS or Azure than build their own datacenter. So the Math checks out.

I don't think compute capacity built already will go waste, likely more and bigger things will get built in the coming years so most of it will be used for that purpose.

achempion · 2025-01-28T14:19:59 1738073999

thanks for the explanation, these facts are completely overlooked in mass media in favour of catchy headlines

shusaku · 2025-01-28T14:00:30 1738072830

You’re confusing the metric for reality. The point is to compare the cost of training in terms of node hours with a given configuration. That’s how you get apples to apples. Of course it doesn’t cover building the cluster, housing the machine, the cleaning staff’s pension, or whatever.

aftbit · 2025-01-28T13:59:54 1738072794

The math they gave was 2,788,000 H800 GPU hours[1], with a rental price of $2/GPU-hour[1], which works out to $5.6M. If they did that on a cluster of 2048 H800s, then they could re-train the model every ~1400 hours (~2 months).

If they paid $70,000 per GPU[2] plus $5000 per 4-GPU compute node (random guess), then the hardware would have cost about $150M to build. If you add in network hardware and other data-centery-things, I could see it reaching into the $200M range. IMO $400M might be a bit of a stretch but not too wildly off base.

To reach parity with the rental price, they would have needed to re-train 70 times (i.e. over 12 years). They obviously did not do that, so I agree it's a bit unfair to cost this based on $2M in GPU rentals. Why did they buy instead of rent? Probably because it's not actually that cheap to get 2048 concurrent high-performance connected GPUs for 60 days. Or maybe just because they had cash for capex.

1: https://stratechery.com/2025/deepseek-faq/

2: https://www.tomshardware.com/news/price-of-nvidia-compute-gp...

aisio · 2025-01-28T13:51:49 1738072309

2048 GPUs cost $400m? pretty sure the GPUs don't cost 200k each?

bottlelion · 2025-01-28T13:55:14 1738072514

And a pile of GPUs doesn't really do much good without servers, racks, networking, power, cooling, and a building to house it all in.

moralestapia · 2025-01-28T14:06:00 1738073160

And a pile of "servers, racks, networking, power, cooling, and a building to house it all in" doesn't cost 10-20x more than the GPUs.

If that were the case, Cooler Master would be the trillion dollar company, lmao.

dessimus · 2025-01-28T15:47:38 1738079258

If Cooler Master could charge enterprise computing prices for their consumer level gear, they probably would be.

moralestapia · 2025-01-28T15:52:31 1738079551

Something like an H100 is definitely a feat of engineering, though.

Nothing prevents Cooler Master from releasing a line of GPUs equally performant and, while at it, even cheaper. But when we measure reality, after the wave function of fentanyl and good intentions collapses ... oh yeah, turns out only nVidia is making those chips, whoops ...

dathinab · 2025-01-28T14:30:30 1738074630

Looking around a bit the price was ~70k$USD _in China_ around the time they where released in 2023, cheaper bulk sells where a thing, too.

Note that this are the China prices with high markup due to export controls etc.

The price of a H800 80GiB in the US is today more like ~32k$USD .

But for using H800 clusters well you also need as fast as possible interconnects, enough motherboards, enough fast storage, cooling, building, interruption free power etc. So the cost of building a "H800" focused Datacenter is much much higher then multiplying GPU cost by number.

Still $400m seem unlikely.

317070 · 2025-01-28T13:56:31 1738072591

I find reports of these GPUs costing $70k each 6 quarters ago [0]. So, maybe not $400m, but a $100m+ number seems about right.

[0] https://www.tomshardware.com/news/price-of-nvidia-compute-gp...

intpx · 2025-01-28T15:51:10 1738079470

You can’t buy the gpus individually, and even if you can on a secondary market, you can’t use them without the baseboard, and you can’t use the baseboard without a compatible chassis, and a compatible chassis is full of CPUs, system memory etc. on top of that, you need a fabric. Even if you cheap out and go RoCE over IB it’s still 400gbs hcas, optics and switches

Yea, a node in a cluster costs as much as an American house. Maybe not on its own, but to make it useful for large scale training, even under the new math of deepseek, it costs as much as a house.

j_not_j · 2025-01-28T14:07:41 1738073261

See https://www.nextplatform.com/2023/05/01/just-how-big-are-nvi...

They estimated $200k for a single NVIDIA GPU-based CPU complete with RAM and networking. That's where my number came from. (RAM and especially very-high-speed networking is very expensive at these scales.)

yorwba · 2025-01-28T14:24:06 1738074246

You mean this part?

"Add it all up, and the average selling price of an Nvidia GPU accelerated system, no matter where it came from, was just under $180,000, the average server SXM-style, NVLink-capable GPU sold for just over $19,000 (assuming the GPUs represented around 85 percent of the cost of the machine)"

That implies they assumed an 8-GPU system. (8 × $19,000 = $152,000 ≈ 85% × $180,000)

j_not_j · 2025-01-28T15:17:30 1738077450

InkCanon · 2025-01-28T14:19:28 1738073968

To clarify, a legitimate benchmark for training is to calculate the running cost, not capex cost. Because obviously the latter would drop dramatically with the number of models you train. But to put into context, Meta wants to spend 50B on AI this year alone. And it already has 150x the compute of DS. The very real math going through investors head is - what's stopping Zuck from taking 10B of that and mailing a 100 million signing bonus to every name on the R1 paper?

cpldcpu · 2025-01-28T14:43:22 1738075402

The $6M that is thrown around is from the DS V3 paper and is for the cost of a single training run for DeepSeek V3 - the base model that R1 is built on.

The number does not include cost for personell, experiments, data preparation, chasing dead ends, and most importantly, it does not include the reinforcement learning step that made R1 good.

Furthermore, it is not factored in that both R3 and V1 are build on top of an enormous amount of synthetic data the was generated by other LLMs.

dtech · 2025-01-28T14:02:28 1738072948

Comparing cost of buying with cost of running is weird. It's not like they build a new cluster, train just this one model, and then incinerate everything.

jgalt212 · 2025-01-28T13:54:25 1738072465

> A computer cluster based on 2048 GPUs would have cost around $400M about two years ago when they built it.

OK, but does this quant fund have this amount of a spare resources to take a flyer on a vanity project?

TypingOutBugs · 2025-01-28T13:57:39 1738072659

They bought between 10k and 50k of them before the US restrictions came into place. Sounds like DeepSeek gets to use them for training, as they were profitable (could still be, not sure).

ahzhou · 2025-01-28T14:12:10 1738073530

You can easily do a fermi estimate based on the information given. They are comparing GPU hours.

See: https://planetbanatt.net/articles/v3fermi.html

K0balt · 2025-01-28T14:00:10 1738072810

If they bought them outright, they might have paid 60m, (GPU only) . After infrastructure, maybe 100M.

Calling the training load for DeepSeek 6% of the value of that cluster seems generous. It probably used less of the recoverable value than that.

SR2Z · 2025-01-28T14:19:20 1738073960

Electricity in China, even at residential rates, is 1/10th the cost it is in CA.

I think the salient point here is that the "price to train" a model is a flashy number that's difficult to evaluate out of context. American companies list the public cloud price to make it seem expensive; Deepseek has an incentive to make it sound cheap.

The real conclusion is that world-class models can now be trained even if you're banned from buying Nvidia cards (because they've already proliferated), and that open-source has won over the big tech dream of gatekeeping the technology.

TechDebtDevin · 2025-01-28T13:54:41 1738072481

Over the last few days people have asked me if they think NVIDIA is fkd.. It still takes two H100s to run inference on the DS v3 671b @ <200 tokens per second.

htrp · 2025-01-28T14:04:00 1738073040

only 2 ? what kind of h100s do you have?

dathinab · 2025-01-28T14:47:45 1738075665

There are different versions of the model as well as using it with different levels of quantization.

Some variants of DeepSeek-R1 can be run on 2x H100 GPUs, and some people managed to get still quite decent results with a even stronger distilled mode running it on consumer hardware.

For DeepSeek-V3 even with 4bit quantization you need more like 16x H100.

TechDebtDevin · 2025-01-28T14:58:08 1738076288

I meant quantized versions but yea, I get your point.

aftbit · 2025-01-28T13:51:56 1738072316

I couldn't find TINFA on Yahoo Finance but I bought INFA assuming that was close enough. Thanks for the financial advice. :P

/s TINFA -> this is not financial advice

baal80spam · 2025-01-28T14:14:49 1738073689

Now that you bought, he will dump! :-P