DeepSeek could represent Nvidia CEO Jensen Huang's worst nightmare

zerof1l · 2025-01-28T13:27:34 1738070854

Could someone explain as to why DeepSeek is bad for Nvidia?

The demand for Nvidia GPUs should now go up. Now anyone can run a GPT-like model by themselves. It's a prime time for businesses to start investing and setting up on-prem infra for that. I know some have been avoiding ChatGPT due to legal concerns and sensitive data.

InkCanon · 2025-01-28T13:44:55 1738071895

People here quoting Jevons paradox are grossly misapplying it. Jevons paradox will predict an increase in usage of compute for AI. Not Nvidia specifically. The specific insane valuation of Nvidia comes from an unending stream of hundreds in billions in big tech capex, constantly buying cutting edge GPUs, because on SV scaling laws are practically religious commandments. It remains to be seen how this affects Nvidia, but it is not a case of Nvidia like many posts here imply, business wise.

intpx · 2025-01-28T15:52:53 1738079573

Except there has yet to be a breakthrough on anything that isn’t Nvidia, using anything that isn’t cuda/nccl etc

That day is certainly coming, but today is not that day

kandesbunzler · 2025-01-28T13:49:34 1738072174

These are all NVDA owners coping, they dont want rational thoughts. The hit in market cap is absolutely deserved, especially now with the threat of tarrifs.

SR2Z · 2025-01-28T14:21:15 1738074075

NVDA has an exceptionally reasonable P/E ratio and is a great pick on fundamentals alone. The tariffs are not going to make other graphics cards + their software 2x better.

kandesbunzler · 2025-01-28T14:29:15 1738074555

pure cope. Hyperscalers already have a ton of gpus who just got 90% more efficient. They will absolutely reduce spend especially once the tariffs hit lol

numpad0 · 2025-01-28T14:57:06 1738076226

from non-investor perspective it's a long awaited sign of eventual profitability in the money pit and 10x sales potential. Not trying to debunk you, to me it's just proof as to how unrelated tech and market worlds are

kandesbunzler · 2025-01-28T16:57:47 1738083467

except that deepseek r1 isnt a gigantic leap in terms of actual intelligence. These models still have questionable use, I have no doubt that more people will want to use them now but its not going to compensate for how much more efficient they are now. Not even close.

SR2Z · 2025-01-29T01:13:12 1738113192

Do you think that the data centers which have already been built are sitting idle, and that's why people are tripping over themselves to spin up new ones?

It's an objective fact that many people find these things useful. Whether or not they're willing to PAY is another issue.

SR2Z · 2025-01-29T01:11:06 1738113066

I mean nobody can pretend that this level of investment will continue forever, but even before the LLM hype NVIDIA was printing cash.

edoloughlin · 2025-01-30T14:15:22 1738246522

Isn't CUDA a pretty good moat for Nvidia?

ogrisel · 2025-01-28T13:49:20 1738072160

I don't understand why it's bad for Nvidia either.

The fact that DeepSeek-R1 is so much better than DeepSeek-V3 at various important tasks means that Chain-of-though / thinking-before-answering models are better. But they are also more compute intensive at inference time than their instruction non-thinking counterparts.

So even if the DeepSeek-V3 pretraining + GRPO COT post-training procedure was cheaper than anticipated to reach o1 grade performance, inference is still costly, even if you use a distilled model.

bildung · 2025-01-28T13:57:31 1738072651

Deepseek offers API pricing directly on their website, so it's pretty easy to compare inference costs indirectly: It's $60.00 vs. $2.19 for 1M output tokens. Openai is 27x as expensive.

thefounder · 2025-01-28T13:31:01 1738071061

If I can write an email from my small and cheap phone why would I buy that big mainframe? The chips to do the job become cheap. The high end chips will become a niche again for research and mil. stuff.

wongarsu · 2025-01-28T14:20:45 1738074045

But if both the mainframe and personal computer are from IBM, is that scenario really bad for them?

If there was serious competition for cards to do GPU training that might spell trouble for Nvidia. But so far we only went from needing an absurdly huge cluster of Nvidia cards to a huge cluster of Nvidia cards, with Nvidia having a near-monopoly on cards used in deep-learning training all the way down to networks that fit on single cards.

IshKebab · 2025-01-28T13:31:44 1738071104

Because the big mainframe will write a much better email?

thefounder · 2025-01-28T13:33:24 1738071204

It looks like that’s not the case anymore. That’s what this is all about.

senko · 2025-01-28T13:47:19 1738072039

It still is. The DeepSeek R1 is 670B parameters, requiring more than 300GB of video ram (or unified memory), even if using 4-bit quantized models.

amazingamazing · 2025-01-28T13:55:39 1738072539

How much video RAM was needed for similar performance a year ago?

senko · 2025-01-28T14:21:41 1738074101

Roughly the same, as our collective goalposts have shifted, and are still shifting.

What was amazing output last year is today's slop.

NV is in no immediate danger; medium/long term, to anyone knowing what "75% gross margin" means it's no secret that there will be serious threats. You don't need DeepSeek for that particular realization.

amazingamazing · 2025-01-28T14:29:59 1738074599

Which open source model last January gave you similar results with the same amount of video ram?

talldayo · 2025-01-29T02:48:36 1738118916

Nomic Falcon, Mixtral, YALM, Bloom, Grok-1 and Jamba, just to name a few of the foundation models.

DannyBee · 2025-01-28T14:09:56 1738073396

That only works until it's good enough that nobody cares about the difference.

IshKebab · 2025-01-28T20:38:58 1738096738

That will only happen when we have AGI and at that point the world has much bigger problems.

DannyBee · 2025-01-30T00:20:43 1738196443

This is definitely one view, but not sure I buy it at all.

History is littered with things that were made orders of magnitude better, people who claimed nobody would be happy until that happened, and then nobody writ large cared .

Tech in particular has a grand history of claiming we need to produce the technically best thing, but in practice, the market almost always chooses somewhere in the range of useful products, which is not the same at all - the technically best thing has rarely, if ever, become the market leader.

chrsw · 2025-01-28T15:00:38 1738076438

I still don't get why this is bad for NVIDIA either. If anything, it brings more people into their ecosystem. NVIDIA is the AI hardware/platform company right now and people will deploy AI for more tasks if it becomes cheaper to do so.

But there's a more subtle point here which I don't see a lot of people talking about, maybe because they know more about this than me. Why wouldn't frontier model developers take DeepSeek R1 techniques and make their models even better or even larger? Or another way to ask: Are the DeepSeek R1 innovations only for making models cheaper (and slightly worse) or can the algorithms developed by the DeepSeek team be scaled up to make more powerful frontier models? Leading edge model developers don't just care about cost, they also want to release models with maximum capabilities. And as we've seen over the past few years, they're willing to pay almost anything to achieve this.

I think most AI researchers know there are still many things to explore in this space so disruptive innovations shouldn't be seen as a bubble burst but as more opportunity.

NVIDIA is in an extremely strong position right now. Even if someone has a major breakthrough on the hardware design side that dramatically lowers the cost of compute for AI workloads (which is highly unlikely), NVIDIA will just create their own implementation that will outperform the original since they have a stranglehold on an entire stack-up of technology: circuits, drivers, libraries and software.

dgreensp · 2025-01-28T16:07:09 1738080429

You’re absolutely right. If there are corrections, it’s only because people have gone a little crazy. People sometimes get enamored with the idea of one country or company dominating in a winner-takes-all situation. OpenAI has been trying to spin narratives in which the only rational move is for everyone to invest all of their money in OpenAI, immediately.

I don’t think the Nvidia CEO is losing any sleep. He doesn’t have a planet-sized ego, like some. They are still selling pickaxes during a gold rush, as the saying goes. And the stock is still up 100% in the past year. CEOs don’t control the stock price (much as some try), anyway, the market does. Also, the market is insane. You also can’t control the competition, or what new innovations come along. You just have to run a good business, which they are doing.

Innovation is good for all players. Reality-checks are good. Nothing here is unexpected. There are lots of smart researchers in the world. Software innovations that make better use of hardware are expected. People just like drama.

Like if Tim does a better job at a skateboard trick that has always been Tom’s thing, people want to be that kid who is the one to say, “Oh, snap!!” and won’t stop talking about it at school, because they were there. And how it’s so mind-blowing and previously inconceivable.

DannyBee · 2025-01-28T14:09:23 1738073363

Inference was already cheapish - businesses already can run gpt-like models by themselves. NVIDIA sales are not driven by inference spending, but training spend. While sure, maybe now people spend more on inference, NVIDIA was not an investment based money on selling GPUs for inference, because it requires so much fewer resources.

Also, inference is easier to get disrupted in just because of how it works.

All told, this makes it a much less likely play.

Finally, the assumption that it goes up assumes people want on-prem infrastructure for this (in your case). Maybe true, maybe not.

Overall, going from being a "sure" thing selling 200m worth of clusters to random companies hand over fist to not necessarily being able to do that is definitely "bad'.

There are other common claims about why it will increase demand, and they rely on different assumptions (that you won't hit a good enough point quickly, etc)

ecocentrik · 2025-01-28T14:38:43 1738075123

Is anyone canceling their orders? Has anyone announced that the efficiency savings from Deepseek's innovations mean we will have superhuman AGI by the end of the year using existing hardware? Or that existing hardware fully covers all of the expected training demand for the next year?

Or will all of the efficiency savings get immediately absorbed by the demand for better performance and feed the demand for inference?

DannyBee · 2025-01-28T23:05:39 1738105539

They don't exactly release this data, as you know, which is why it's so volatile - nobody has any real idea, just best guesses, and lots of different smart people have lots of different opinions..

But i agree in practice that over time nvidia's path now depends heavily on the answer to "Or will all of the efficiency savings get immediately absorbed by the demand for better performance and feed the demand for inference?"

Before, their next 5-10 years did not really depend on meaningful efficiency savings existing and getting them absorbed - nobody expected meaningful efficiency savings.

Now it does depend on that

ecocentrik · 2025-01-29T13:31:03 1738157463

There have been continuous efficiency savings innovations for both training and inference over the last few years. Discovering and absorbing those gains is a normal part of the development process. It occurs in every industry and is facilitated in computer science by mechanisms like open source and published papers.

Those savings don't usually make national news because they aren't the target. They aren't "man on the moon" moments, they are "our cannonball flies 96% as far as yours using 30% less powder" moments.

This is still a race for tools that essentially brute force their way to greater utility. Until we hit a utility quotient or hit a wall, the race doesn't really end. The quickest and easiest path to advantage in the race is still more compute.

DannyBee · 2025-01-30T00:25:33 1738196733

"There have been continuous efficiency savings innovations for both training and inference over the last few years."

I'm aware - i've run the relevant teams at large scale companies like Google :)

Is your claim anyone has made 98% total vs, say, 5 years ago?

If so, evidence please?

Otherwise, seems sort of irrelevant.

AFAIK, nobody has come anywhere close to this, and the only possible way they would is through externalizing roughly all costs onto someone else.

(IE certainly google TPU's are much more efficient than 5 years ago, but since you have to account for development cost, they are not saving 98% vs 5 years ago)

ivan_gammel · 2025-01-28T13:51:55 1738072315

Nvidia was and still is overpriced. Yes, AI is now on the mainframe stage. Soon it will go to consumer devices with a billion of potential users and the training and the required hardware for it will be very different: much less power consumption, continuous training etc, just like our or animal brain. Consumers did buy computers for $1000 in 1990s. They may buy a robot for $10000 tomorrow. The benchmark for the market cap in this segment are car manufacturers or Intel in 1990s.

throwawaymaths · 2025-01-28T13:31:19 1738071079

i believe the article is making the case that the market has already priced in the expectation that training (but especially training which is what NVIDIA is useful for) and inference will be continually more expensive and deepseek implies that the hardware required to train and infer really good models has a lower ceiling.

easygenes · 2025-01-28T14:04:29 1738073069

Mostly it just means that we’ve got a new tool we can leverage which is a positive feedback for improving models, and will ultimately increase both train and inference demand and shorten timescales for deeper capabilities.

nijuashi · 2025-01-28T18:43:27 1738089807

Absolutely agree on this. 1. DeepSeek just inspired a LOT of startups to develop their own as you no longer need to be a tech-giant to compete on training. 2. Companies with sensitive info will now buy their own GPUs to run their models locally as the range of application increased (as it did with Llama3) 3. As with 2, new services that were prohibitive with ChatGPT API will spring up. It’s difficult to reliably rent GPUs for services even with enough money, so people will buy more GPUs to host it themselves.

I understand if new CPU/GPU can outperform Nvidia and DeepSeek was developed using another GPU, but this is not the case. Lower requirement for higher performance historically never reduced the need for computational capacity. There is very poor reasoning for this move other than purely “technical” (trading-wise) reasons.

headcanon · 2025-01-28T14:19:24 1738073964

Agreed, I believe AI/LLMs will create induced demand for compute long-term, even if we get more efficient models (long-term meaning >6mo). Markets don't look ahead for more than a quarter or two though, so it makes sense there would be a correction based on the news.

nashashmi · 2025-01-28T14:13:12 1738073592

It is not bad for Nvidia. It is bad for hype. Nvda position was good for being a company selling chips with lots of power. But now that not much power is needed, reliance on nvidia is less. And anyone with simpler processors has capabilities to sell AI chips.

Nvidia will still sell many chips. It just won’t be the only one capable of selling them. The hype is gone. The moat is gone. The excitement and enthusiasm from investors is gone. This is called a correction.

baal80spam · 2025-01-28T14:54:01 1738076041

> The moat is gone.

CUDA begs to differ.

samsartor · 2025-01-28T13:59:27 1738072767

I think it's also generally understood that Nvidia owns the training space, but not the inference space. They have a lot more competition there and the margins are smaller. More people running AI models is still good for them, but a drop in the bucket compared to the money they were making from training clusters being built out. And I think everyone just realized they can probably make do with their existing clusters.

Ekaros · 2025-01-28T13:32:24 1738071144

Question really is does the demand go up at current prices of units? Thus do current profit levels keep.

If the training was too expensive previously is it now cheap enough or does it need to be even cheaper? As if it weren't too expensive everyone who were going to invest did. And if it is still too expensive, well prices have to go lower or more efficiency has to be found.

I am not entirely sure if there is huge amount of unmet demand for training.

amazingamazing · 2025-01-28T13:28:33 1738070913

Given some demand, D - more efficiency means less supply is needed to meet demand D, thus, bad for Nvidia.

aftbit · 2025-01-28T13:50:38 1738072238

That assumes demand remains constant. Maybe lowering the overall cost to train a model will mean that more people want to train models, thus raising the demand.

amazingamazing · 2025-01-28T13:53:57 1738072437

Maybe, maybe not. Personally I find it unlikely. There are not a huge amount of folks needing to train their own model.

baq · 2025-01-28T14:08:03 1738073283

There are a lot of folks lining up to use a smart model. This takes tokens. I’m not convinced nvidia is blown up by this news at all. The trade has become less crowded, that is true.

amazingamazing · 2025-01-28T14:09:10 1738073350

if things become too efficient then you can use commodity compute and don't need GPUs at all. I'm not sure why you would think breakthroughs in efficiency would be good for nvidia. eventually a regular mac or PC will be able to do what you need a H100 to do now. that won't be good for nvidia.

baq · 2025-01-28T14:14:15 1738073655

My bet is there's no such thing as 'too efficient' in this space. If you can get a very good model on a small device, it's going to be totally amazing on a huge GPU.

amazingamazing · 2025-01-28T14:21:24 1738074084

I bet you're wrong - there are already massive diminishing returns in the best models from 2024 vs 2023. This idea that you can just through more compute and it scales with performance is fiction. You do get more performance with more compute, but it doesn't scale, and it's a waste of money, as shown with deepseek.

this conversation reminds me of people when the PS2 came out saying that by 2010 games would look literally better than real life, because they thought graphics quality would exponentially improve...

baq · 2025-01-28T14:57:54 1738076274

I would agree except I think it’s 1997. I’ve gotten DeepSeek to mostly solve a not very complex problem in a somewhat obscure domain (home assistant automations) in 214 seconds of its own ‘thinking’ and if you can get this two orders of magnitude lower this unlocks completely new use cases ie. demand.

feverzsj · 2025-01-28T14:25:57 1738074357

The market is already losing patience on "profitable AI". Deepseek only improves cost with worse performance, while even chatgpt is still far from profitable.

WhyNotHugo · 2025-01-28T13:43:22 1738071802

The US has prohibited nvidia from exporting to China, so DeepSeek (the company) won’t be a customer, and others wanting to self-host their models have the freedom to pick any hardware vendor.

Cthulhu_ · 2025-01-28T14:04:38 1738073078

This thing underlines how nvidia is in trouble; it proves (unless Deepseek used servers outside of china) that banning nvidia from china does not give them any kind of advantage anymore.

elorant · 2025-01-28T14:17:36 1738073856

What happens with excess hardware that’s been already bought? It could take years for all of that to be absorbed by smaller companies. And secondly, if this opens the era of self-training models then why go for a 671B one and not a smaller that’s fine-tuned to a company’s specific needs and can then run on consumer GPUs. At 70B with 8-bit quantization you’re good with just three 3090s.

pixl97 · 2025-01-28T14:25:40 1738074340

Deepseek isn't AGI yet and really still has a long way to go.

Na, we have a long way to go with models, especially when you start adding different modes. We'll still need a metric shitload of compute for a long time.

numpad0 · 2025-01-28T14:51:54 1738075914

Yeah, to me it looks like a lot of investors were into AI for skewed agendas and 1984 style fantasies and panicking after told it's not about that.

j_not_j · 2025-01-28T13:48:05 1738072085

I don't for a minute believe Deepseek v3 was built with a $6M rental.

Their paper (arxiv 2412:1947) explains they used 2048 H800s. A computer cluster based on 2048 GPUs would have cost around $400M about two years ago when they built it. (Give or take, feel free to post corrections.)

The point is they got it done cheaper than OpenAI/Google/Meta/... etc.

But not cheaply.

I believe the markets are overreacting. Time to buy (tinfa).

segmondy · 2025-01-28T13:58:41 1738072721

They pointed out that the cost calculation is based on if those GPUs were rented at $2/hr. They are not factoring in the prior cost of buying those H800s because they didn't buy it to build R1. They are not factoring in the cost to build v2, or v2.5. The cost is to build V3. The cost to build R0 and R1 on top of v3, seems far cheaper and they didn't mention that. They are not factoring in the cost to build out their datacenter or salary. Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.

"Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."

V3 was released a bit a month ago, V3 is not what took the world by storm but R1. The price everyone is talking about is the price for V3.

ecocentrik · 2025-01-28T14:12:07 1738073527

If this weren't an attempt to sell a false equivalency, at least one story would have details on the equivalent rental cost of compute used to train closed source frontier models from OpenAI, Anthropic, Mistral... Lack of clarity makes it a story.

kamaal · 2025-01-28T14:54:34 1738076074

>>Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.

This is still quite impressive, given most people are likely to buy cloud infrastructure from AWS or Azure than build their own datacenter. So the Math checks out.

I don't think compute capacity built already will go waste, likely more and bigger things will get built in the coming years so most of it will be used for that purpose.

achempion · 2025-01-28T14:19:59 1738073999

thanks for the explanation, these facts are completely overlooked in mass media in favour of catchy headlines

shusaku · 2025-01-28T14:00:30 1738072830

You’re confusing the metric for reality. The point is to compare the cost of training in terms of node hours with a given configuration. That’s how you get apples to apples. Of course it doesn’t cover building the cluster, housing the machine, the cleaning staff’s pension, or whatever.

aftbit · 2025-01-28T13:59:54 1738072794

The math they gave was 2,788,000 H800 GPU hours[1], with a rental price of $2/GPU-hour[1], which works out to $5.6M. If they did that on a cluster of 2048 H800s, then they could re-train the model every ~1400 hours (~2 months).

If they paid $70,000 per GPU[2] plus $5000 per 4-GPU compute node (random guess), then the hardware would have cost about $150M to build. If you add in network hardware and other data-centery-things, I could see it reaching into the $200M range. IMO $400M might be a bit of a stretch but not too wildly off base.

To reach parity with the rental price, they would have needed to re-train 70 times (i.e. over 12 years). They obviously did not do that, so I agree it's a bit unfair to cost this based on $2M in GPU rentals. Why did they buy instead of rent? Probably because it's not actually that cheap to get 2048 concurrent high-performance connected GPUs for 60 days. Or maybe just because they had cash for capex.

1: https://stratechery.com/2025/deepseek-faq/

2: https://www.tomshardware.com/news/price-of-nvidia-compute-gp...

aisio · 2025-01-28T13:51:49 1738072309

2048 GPUs cost $400m? pretty sure the GPUs don't cost 200k each?

bottlelion · 2025-01-28T13:55:14 1738072514

And a pile of GPUs doesn't really do much good without servers, racks, networking, power, cooling, and a building to house it all in.

moralestapia · 2025-01-28T14:06:00 1738073160

And a pile of "servers, racks, networking, power, cooling, and a building to house it all in" doesn't cost 10-20x more than the GPUs.

If that were the case, Cooler Master would be the trillion dollar company, lmao.

dessimus · 2025-01-28T15:47:38 1738079258

If Cooler Master could charge enterprise computing prices for their consumer level gear, they probably would be.

moralestapia · 2025-01-28T15:52:31 1738079551

Something like an H100 is definitely a feat of engineering, though.

Nothing prevents Cooler Master from releasing a line of GPUs equally performant and, while at it, even cheaper. But when we measure reality, after the wave function of fentanyl and good intentions collapses ... oh yeah, turns out only nVidia is making those chips, whoops ...

dathinab · 2025-01-28T14:30:30 1738074630

Looking around a bit the price was ~70k$USD _in China_ around the time they where released in 2023, cheaper bulk sells where a thing, too.

Note that this are the China prices with high markup due to export controls etc.

The price of a H800 80GiB in the US is today more like ~32k$USD .

But for using H800 clusters well you also need as fast as possible interconnects, enough motherboards, enough fast storage, cooling, building, interruption free power etc. So the cost of building a "H800" focused Datacenter is much much higher then multiplying GPU cost by number.

Still $400m seem unlikely.

317070 · 2025-01-28T13:56:31 1738072591

I find reports of these GPUs costing $70k each 6 quarters ago [0]. So, maybe not $400m, but a $100m+ number seems about right.

[0] https://www.tomshardware.com/news/price-of-nvidia-compute-gp...

intpx · 2025-01-28T15:51:10 1738079470

You can’t buy the gpus individually, and even if you can on a secondary market, you can’t use them without the baseboard, and you can’t use the baseboard without a compatible chassis, and a compatible chassis is full of CPUs, system memory etc. on top of that, you need a fabric. Even if you cheap out and go RoCE over IB it’s still 400gbs hcas, optics and switches

Yea, a node in a cluster costs as much as an American house. Maybe not on its own, but to make it useful for large scale training, even under the new math of deepseek, it costs as much as a house.

j_not_j · 2025-01-28T14:07:41 1738073261

See https://www.nextplatform.com/2023/05/01/just-how-big-are-nvi...

They estimated $200k for a single NVIDIA GPU-based CPU complete with RAM and networking. That's where my number came from. (RAM and especially very-high-speed networking is very expensive at these scales.)

yorwba · 2025-01-28T14:24:06 1738074246

You mean this part?

"Add it all up, and the average selling price of an Nvidia GPU accelerated system, no matter where it came from, was just under $180,000, the average server SXM-style, NVLink-capable GPU sold for just over $19,000 (assuming the GPUs represented around 85 percent of the cost of the machine)"

That implies they assumed an 8-GPU system. (8 × $19,000 = $152,000 ≈ 85% × $180,000)

j_not_j · 2025-01-28T15:17:30 1738077450

InkCanon · 2025-01-28T14:19:28 1738073968

To clarify, a legitimate benchmark for training is to calculate the running cost, not capex cost. Because obviously the latter would drop dramatically with the number of models you train. But to put into context, Meta wants to spend 50B on AI this year alone. And it already has 150x the compute of DS. The very real math going through investors head is - what's stopping Zuck from taking 10B of that and mailing a 100 million signing bonus to every name on the R1 paper?

cpldcpu · 2025-01-28T14:43:22 1738075402

The $6M that is thrown around is from the DS V3 paper and is for the cost of a single training run for DeepSeek V3 - the base model that R1 is built on.

The number does not include cost for personell, experiments, data preparation, chasing dead ends, and most importantly, it does not include the reinforcement learning step that made R1 good.

Furthermore, it is not factored in that both R3 and V1 are build on top of an enormous amount of synthetic data the was generated by other LLMs.

dtech · 2025-01-28T14:02:28 1738072948

Comparing cost of buying with cost of running is weird. It's not like they build a new cluster, train just this one model, and then incinerate everything.

jgalt212 · 2025-01-28T13:54:25 1738072465

> A computer cluster based on 2048 GPUs would have cost around $400M about two years ago when they built it.

OK, but does this quant fund have this amount of a spare resources to take a flyer on a vanity project?

TypingOutBugs · 2025-01-28T13:57:39 1738072659

They bought between 10k and 50k of them before the US restrictions came into place. Sounds like DeepSeek gets to use them for training, as they were profitable (could still be, not sure).

ahzhou · 2025-01-28T14:12:10 1738073530

You can easily do a fermi estimate based on the information given. They are comparing GPU hours.

See: https://planetbanatt.net/articles/v3fermi.html

K0balt · 2025-01-28T14:00:10 1738072810

If they bought them outright, they might have paid 60m, (GPU only) . After infrastructure, maybe 100M.

Calling the training load for DeepSeek 6% of the value of that cluster seems generous. It probably used less of the recoverable value than that.

SR2Z · 2025-01-28T14:19:20 1738073960

Electricity in China, even at residential rates, is 1/10th the cost it is in CA.

I think the salient point here is that the "price to train" a model is a flashy number that's difficult to evaluate out of context. American companies list the public cloud price to make it seem expensive; Deepseek has an incentive to make it sound cheap.

The real conclusion is that world-class models can now be trained even if you're banned from buying Nvidia cards (because they've already proliferated), and that open-source has won over the big tech dream of gatekeeping the technology.

TechDebtDevin · 2025-01-28T13:54:41 1738072481

Over the last few days people have asked me if they think NVIDIA is fkd.. It still takes two H100s to run inference on the DS v3 671b @ <200 tokens per second.

htrp · 2025-01-28T14:04:00 1738073040

only 2 ? what kind of h100s do you have?

dathinab · 2025-01-28T14:47:45 1738075665

There are different versions of the model as well as using it with different levels of quantization.

Some variants of DeepSeek-R1 can be run on 2x H100 GPUs, and some people managed to get still quite decent results with a even stronger distilled mode running it on consumer hardware.

For DeepSeek-V3 even with 4bit quantization you need more like 16x H100.

TechDebtDevin · 2025-01-28T14:58:08 1738076288

I meant quantized versions but yea, I get your point.

aftbit · 2025-01-28T13:51:56 1738072316

I couldn't find TINFA on Yahoo Finance but I bought INFA assuming that was close enough. Thanks for the financial advice. :P

/s TINFA -> this is not financial advice

baal80spam · 2025-01-28T14:14:49 1738073689

Now that you bought, he will dump! :-P

cpldcpu · 2025-01-28T14:48:10 1738075690

All this media frenzy around DS V1 makes me feel sick to my stomach.

It increased the noise in the AI space by orders of magnitude. Every media outlet is bombarding you with a relentless torrent of half-true information, exaggered interpretation of single facts and speculation.

Sometimes I wonder whether this is amplified by a state actor?

Also curious, how Minimax-01, which is also an excellent model with impressive improvements, went by completely unnocited.

The only good thing is that this certainly put an end to OpenAIs price gauging - pretty sure that $200/month individual plan is not the limit of their imagination.

numpad0 · 2025-01-28T14:37:12 1738075032

It can't get sillier than this. All that DeepSeek did were showing that NVIDIA products hold more value than believed, and that US dominance more illusory than thought. It has zero negative bearing on eventual path to AGI.

And that's somehow supposed to be end of NVIDIA? Hello?

chvid · 2025-01-28T14:00:03 1738072803

It is the end of neural language model scaling as described here:

https://openai.com/index/scaling-laws-for-neural-language-mo...

And a return to more normal world where progress comes from refinement of algorithms and approach rather than brute force compute.

Deepseek is just the poster child.

K0balt · 2025-01-28T14:04:25 1738073065

If we look at the processing requirements in nature, I think that the main trend in AI going forward is going to be doing more with less, not doing less with more, as the current scaling is going.

Thermodynamic neural networks may also basically turn everything on its ear, especially if we figure out how to scale them like NAND flash.

If anything, I would estimate that this is a space-race type effort to “win” the AI “wars”. In the short term, it might work. In the long term, it’s probably going to result in a massive glut in accelerated data center capacity.

The trend of technology is towards converging with or doing better than natural processes, not doing it 100000x less efficiently. I don’t think AI will be an exception.

If we look at what is -theoretically- possible using thermodynamic wells, with current model architectures, for instance, we could (theoretically) make a network that applies 1t parameters in something like 1cm2. It would use about 20watts, back of the napkin, and be able to generate a few thousand T/S.

Operational thermodynamic wells have already been demonstrated en silica. There are scaling challenges, cooling requirements, etc but AFAIK no theoretical roadblocks to scaling.

Obviously, the theoretical doesn’t translate to results, but it does correlate strongly with the trend.

So the real question is, what can we build that can only be done if there are hundreds of millions of NVIDIA GPUs sitting around idle in a few years? Or alternatively, if those systems are depreciated and available on secondary markets?

What does that look like?

wongarsu · 2025-01-28T14:28:31 1738074511

Also evidenced by 3B and 7B models becoming ever more capable, gaining qualities that were previously only achieved by models a magnitude larger.

Bigger models are more capable, but smaller models can be iterated on faster. For a couple months now most of the impressive achievements have been in increasingly smaller and cheaper models. Deepseek just has the perfect storm of impressive results, accessibility and international rivalry that made it go viral

cpldcpu · 2025-01-28T14:51:58 1738075918

Thats not really correct.

It's actually the beginning of test time scaling. R1 has shown that a very simple reinforcement learning scheme can be used to teach the model how to think in a chain-of-though as an emergent property.

No addition pretraining data needed! Only more compute.

alephnerd · 2025-01-28T13:19:12 1738070352

There was no market reaction when Deepseek r1 was released.

There was no market reaction when Deepseek v3 was released.

There was a massive market reaction when a Deepseek app was released.

There is too much damn irrationality in the AI and AI adjacent market right now, and pop articles like this are only making it worse.

Jevons Paradox absolutely holds for model development, and Deepseek should (and is - all my cybersecurity startup peers are investigating the feasibility of building their own models now) be viewed as an opportunity for smaller or medium stage companies to build their own competitive domain specific models, while not having to pay what is essentially protection money to OpenAI or Anthropic.

The only people at risk with the democratization of model development are the investors in foundational model companies

---------

Also, I hope Deepseek FINALLY reprioritizes distributed systems education in CS.

There are too many "ML Engineers" who do not know the basics of OS or Hardware Engineering (eg. Cannot optimize Mellanox hardware, really dig into workload optimization, etc) and are thus burning resources inefficiently. Most MLEs I meet now just import a package and glue code together WITHOUT also understanding the performance capabilities that their compute has.

Most universities in the US either don't require OS or CompArch classes for CS majors (eg. Harvard) or dumb them down significantly (eg. Not going deep into Linux kernel implementation, scheduling, etc).

This is why the entire cybersecurity industry has largely shifted to Israel and India, and the skillset overlaps significantly with MLOps and MLEng (if you know how to implement spinlocks in C you can easily be taught CUDA)

<"Old Man yelling at clouds" rant over>

InkCanon · 2025-01-28T13:41:06 1738071666

Jevons paradox has been quoted to death here - it really needs pointing out that it points to a commodity in general, not Nvidia. Nvidia is not selling just coal, it's selling a super purified ultra powerful coal only it can make. Jevons paradox just says coal (compute) usage will increase. Not that this specific type of coal will.

And this is quite separate from the business of it. For example Jevons paradox is quite apparent in the airline industry, but airlines are notorious for possibly never having net made a profit because of high capex and aggressive pricing competition. Jevons paradox is not any reason at all for a runaway valuation of a company.

alephnerd · 2025-01-28T13:43:04 1738071784

I agree, but the existing moat that Nvidia has due to CUDA and the entire HPC ecosystem still holds. And this is why it's relevant to treat Nvidia as the majority of the GPU space segment.

Furthermore, it's Nvidia that owns Mellanox - which is the owner of the Infiniband IP which is used in just about every DC or cluster, because no other vendor came as close for interconnect performance

InkCanon · 2025-01-28T14:16:14 1738073774

You're on the dot here. Most people are looking at it wrong. Someone here once put it succinctly - Nvidia isn't selling GPUs, it's selling Infiniband clusters. The whole NVLink/CUDA ecosystem is the key, and a mass of separate players (Cerebras, Google TPUs for inference, Mojo for CUDA, AMD/Intel/Huawei etc for GPUs, etc) are very unlikely to break that. In fact NVLink is so powerful, based on the DeepSeek paper it appears to be nerfed (classic Nvidia market segmentation strategy), and a big part of squeezing their compute was hacking around that.

That said, the reason I don't invest Nvidia is that they're selling one more thing - AI legitimacy. Every tech shop is buying Nvidia because all their CEOs need to go to the board and say, we've got so and so AI strategy. Sure as hell Meta isn't spending 50 billion a year because AI makes them so much damn money (supposedly ML saved their ass when Apple put on App Tracking Transparency, but 99% suds it's not the ML that needs a gigawatt GPU cluster to run). And the very real calculus for investors/CEOs is that if DeepSeek made this on a thousandth of the budget (conflicting reports but high probably DS has a fraction of the compute), whYs stopping Zuckerberg from taking 10B of that capex spend and offering every name on that DeepSeek paper a 10 million salary?

lm28469 · 2025-01-28T13:47:05 1738072025

> Jevons paradox

Nobody knew what it was yesterday and now everyone is using it as if they came up with the idea themselves

https://trends.google.com/trends/explore?date=today%205-y&q=...

SilasX · 2025-01-28T14:24:12 1738074252

What are you talking about?

I see two pages, just of submissions on HN, before early last year:

https://hn.algolia.com/?dateEnd=1706400000&dateRange=custom&...

I see over 900 results just for comments:

https://hn.algolia.com/?dateEnd=1706400000&dateRange=custom&...

I personally have brought it up when it's relevant, like when people think energy efficiency improvements always make total use go down:

https://news.ycombinator.com/item?id=14764845

https://news.ycombinator.com/item?id=19543772

And when a more efficient website showed more load on the server (because it widened usage to a broader audience):

https://news.ycombinator.com/item?id=13602792

How about the more charitable hypothesis: It's a relevant dynamic because this is a situation where the paradox most strongly applies: efficiency improvement in use of an input whose output has insatiable demand. And so people bring it up, even if they just recently learned it from another comment.

It's okay to say something true ... even if someone else already said it elsewhere.

lm28469 · 2025-01-28T16:23:52 1738081432

According to google search ~1/3rd of all hits for "Jevons paradox" appeared since January 25 2025

It's the fancy big word of the day if you want to appear smart on social media.

SilasX · 2025-01-28T16:39:51 1738082391

Yes, and again, that doesn't make it wrong or irrelevant. Believe it or not, the popular wisdom can still be correct!

lm28469 · 2025-01-28T16:52:20 1738083140

I never said it was wrong, I'm just saying a lot of people became instant experts on topics they had no idea even existed 24 hours ago, including people with grand techno prophecies and people giving financial advices

SilasX · 2025-01-28T19:21:29 1738092089

If someone says something correct that they just learned, they don't deserve ridicule just because they learned it recently and others are saying it too.

glenstein · 2025-01-28T13:53:48 1738072428

I certainly didn't know about it. Was there any high visibility use of it over social media in the past few days?

lm28469 · 2025-01-28T13:57:08 1738072628

Satya Nadella on twatter

skwee357 · 2025-01-28T13:57:01 1738072621

Welcome to the internet. Astrologers proclaimed the week of DeepSeek, the number of Jevons paradox experts quadrupled.

onlyrealcuzzo · 2025-01-28T14:16:18 1738073778

Correct me if I'm wrong, but Jevons paradox says nothing about profits from selling coal, the growth rates of coal consumption, etc.

Nvidia currently has ~75% margins and ~90% growth. If either of those nobs get turned down slightly, their valuation can tank.

Which is what happened.

If Nvidia's 10-year growth forecast went from ~40% per year (29x in 10 years) to ~30% per year (14x in 10 years) - that's a 50% smaller future company you're expecting.

It's still great for Nvidia. It's just LESS great than people thought before, which means their market cap comes down a ton.

That's what you're all forgetting.

Jlagreen · 2025-01-29T13:25:12 1738157112

Nvidia had 65% margins 5 years ago when they primarily sold gaming GPUs.

What margins do you think Nvidia is charging for RTX which getting more and more expensive every generation?

What margins do you think Project Digits will be for Nvidia? That thing costs 50% more than a RTX card and on the RTX card Nvidia only earns on the chip which is only part of the card. Digits is a 100% Nvidia product.

Does it matter if 1 Blackwell DC GPU or 13 Digits are sold? No, Nvidia has probably an insane margin on Digits and has only one goal, total spread and integration of Nvidia HW in any AI workload. Because then Nvidia can offer SW with super margins.

Think of Nvidia Enterprise AI, Omniverse, Clara, Isaac, DriveSim and many more. If 95% of AI acclerators are from Nvidia then it's only a small step to also use Nvidia SW frameworks.

InkCanon · 2025-01-28T14:23:53 1738074233

Exactly. Any rational person will predict Nvidia will own the space. But projected over a long period, a 10% decrease in growth is massive

xnorswap · 2025-01-28T13:47:27 1738072047

> There was no market reaction when Deepseek r1 was released.

> There was no market reaction when Deepseek v3 was released.

> There was a massive market reaction when a Deepseek app was released.

This signals that there was money on the table by paying attention to the underlying tech.

TechDebtDevin · 2025-01-28T13:49:17 1738072157

Everything I know about ML is self taught. My local Uni has a ML Engineering degree that's relatively new. Every once in a while I look at the curriculum, outside of some higher lvl math, 90% of that four year degree consists of topics I "mastered" in the first 6-12 months of self study trying to build prediction models for various sports.

blackkat · 2025-01-28T14:01:47 1738072907

Would you mind giving some pointers of where to start?

There is a wasteland of blogspam around the topic of getting started with ML and it’s hard to know what is useful at the beginning.

TechDebtDevin · 2025-01-28T14:47:37 1738075657

If I could go back I would start by reading Josh Starmer's Statquest Guide to Machine Learning, and then his guide to AI/Nueral Networks[0]. Starmer does the best job at explaining advanced ML topics in a very beginner friendly way, the books are literally written in the format of a children's book.

Then just start tinkering. I got interested in ML because of sport's analytics and betting markets so I read a lot of papers on that topic and books similar to Bayesian Sports Models in R by Andrew Mack[1].Also, Jake VanderPlas's Python Data Science Handbook is good[2].

Ideally, find a vertical you're interested in where experts have applied ML and read their papers/books and work backwards from there.

[0]: https://statquest.org/statquest-store/ [1]: https://www.goodreads.com/book/show/216487475-bayesian-sport... [2]: https://www.oreilly.com/library/view/python-data-science/978...

alephnerd · 2025-01-28T13:56:58 1738072618

Good for you.

Do you know how to optimize the MTU of an Infiniband interconnect? Do you understand how to schedule multiple models being trained concurrently? Do you know why you cannot directly leverage bare metal compute within a Docker image?

This is important Infra knowledge you need to take full advantage of any model you are training on your own hardware. And this is why Deepseek was successful - they understood the ins-and-outs of systems programming and the H800 architecture to maximize the compute performance they needed to train their model.

beardedwizard · 2025-01-28T14:07:40 1738073260

What about Linux namespaces makes you think it's not running on bare metal? Maybe you are thinking of versions of docker that introduce a Linux hyper visor like macos?

alephnerd · 2025-01-28T14:11:58 1738073518

> What about Linux namespaces makes you think it's not running on bare metal

Nothing!

That's my point! I've met a number of "MLEs" who couldn't push back like that with my very basic "fizzbuzz" question

mhog_hn · 2025-01-28T13:34:57 1738071297

What resources are suggested to get started in this area? Assuming a distributed systems web dev background

alephnerd · 2025-01-28T13:52:54 1738072374

UIUC CS241 - https://pkgamma.github.io/cs241-site-clone-fa19/

UCB's CS162 - https://cs162.org/

MIT 6.1810 - https://kaashoek.github.io/65810-2023/

These were tablestake courses that teach you the basics of OS and Systems Development (eg. Synchronization, Scheduling, etc).

After you understand that, then you'd start digging into HPC classes AND then ideally go into ML.

At least, this was the workflow and mental model I followed a decade+ ago.

mhog_hn · 2025-01-28T16:33:28 1738082008

Thanks!

samvher · 2025-01-28T13:47:12 1738072032

I don't think the part about the market reaction is too surprising, for COVID it took a while for the market to react as well.

Seeing that DeepSeek-R1 has benchmark scores like o1 is not the same as seeing that people actually like it, that it's being adopted, and that stated training costs are getting accepted as credible.

(But I agree that there is a lot of irrationality in these markets.)

high_na_euv · 2025-01-28T13:21:40 1738070500

Or people in charge cannot properly evaluate things :)

It is very possible, when things require advanced technical knowledge and experience

><"Old Man yelling at clouds" rant over>

They are too busy deploying ruby apps and juggling jsons over https to mess with ugly cpp, segfaults, kernel details, hardware intrinsics and semiconductors

alephnerd · 2025-01-28T13:34:18 1738071258

At the scale that the selloffs happened, that meant there were some shifts by institutional investors. These are the kinds of managers that would have technical people on retainer to consult (eg. Guidepoint).

> They are too busy deploying ruby apps and juggling jsons over https

Ruby is "legacy" as well nowadays. Most younger devs I meet tend to really only understand JS and Python, and that too while heavily relying on outside packages or dependencies.

Not a bad thing per say, but if a VC like me has deeper knowledge about OS internals or Mellanox tuning than some of the (American) MLEs in companies they've done due diligence on, something's very wrong with the talent pipeline.

highcountess · 2025-01-28T13:34:32 1738071272

Humans are generally not very good at properly evaluating things in general. They are exceptionally good at believing themselves to be very good at properly evaluating things though.

It is why empires collapse from miscalculations about war by their experts, it is the foundational mental model underlying communism that simply cannot work but keeps being attempted, and it is how countries can be destroyed in every which way while the experts maintain that everything is just fine.

Never underestimate the powers of the ego, the destroyer of worlds. I mean they don’t know what they are doing, but we have advanced technical knowledge and experience, so we should clearly be in charge of all things, including those beyond our narrow scope of advanced technical knowledge and experience.

dagss · 2025-01-28T13:42:26 1738071746

To be fair, capitalism doesn't really work either, you get a very unstable systems with cycles (which people try to stabilize in ad hoc ways), you optimize for the wrong things long term (right now producing much more goods than we need, without factoring in devastating costs that climate change will bring in a few decades, and destruction of nature and eradication of species) and when things crash many people are hurt. And it practically needs wars and destruction to reset wealth distribution every 50 to 100 years.

No communism doesn't work either. I wish there was more alternatives.

And some of the usual examples of "socialism doesn't work" might have worked if US hadn't interfered. E.g. in Chile, no system would have survived US actively supporting Pinochet. We will never know how socialism would have played out there if US (and Soviet too) left it alone.

colinnordin · 2025-01-28T14:20:20 1738074020

R1 and O1 points towards a world where training models will be a small bucket of the overall compute. That doesn't mean the total amount of AI compute will stop accelerating, just that interconnected mega clusters is not the only or most efficient way to run a majority of future workloads. That should be negative news for the company that is currently the only one that is capable of making chips for these clusters, and positive news for the players that can run inference on a single chip, as they will be able to grab more parts of the compute pie.

nashashmi · 2025-01-28T14:25:39 1738074339

Was the US sanctions on chip selling to China worth it? Yesssss. It was. It still is. (But I frown on this action.)

China is good on dumb stupid manufacturing. It does this with so much enthusiasm HNers are left scratching their heads why the excitement. The possibility of China manufacturing the next computing powerhouse was too high. Higher than it could ever be in the US.

The threat to US leadership was that the fastest supercomputer for AI in the world could be in China (the first sputnik moment per Barack Obama). So the chip embargo was to prevent that from happening.

So China did not spend/waste $$ & time building hardware. They spent it on software. Normally the Chinese version is cheap enough and good enough. But this time it was cheap enough and surprisingly terrific enough. Props to Liang the founder for having that game face on.

feverzsj · 2025-01-28T13:52:17 1738072337

Nvidia AI Chip smuggling is a huge business in china. DeepSeek company on its own smuggled 50k h100.

TechDebtDevin · 2025-01-28T13:59:15 1738072755

Its not smuggling if Nvidia vendors openly sell it to them :) I'm pretty sure most of Russia's military equipment is filled with Texas Instrument MCs. These export controls don't work.

dannyfreeman · 2025-01-28T14:04:32 1738073072

Source: trust me

seydor · 2025-01-28T14:10:59 1738073459

I find it hard to believe that serious investors thought that vertical progress in ai had peaked, and it would be only horizontal expansion from now on. We know that it takes decades for this to happen in every industry.

It seems more like , investors were looking for a significant event to pop the obvious bubble if the last year

Terretta · 2025-01-28T14:16:46 1738073806

Which investors, specifically, were looking for a significant event to pop the obvious bubble?

Investors who were actually invested, or onlookers with FOMO who hadn't invested yet?

While I think the "AI" companies invested in are overpriced*, I think the market as a whole hasn't even begun to price in the changes in store across non-tech sectors and industries even if LLMs were to advance no further than now, with all additional progress at the prompting and agentic levels.

* Even here, Jevons paradox applies. So what's being called "priced for perfection" may actually be priced for business as usual with multiplied demand.

childintime · 2025-01-28T13:57:23 1738072643

Time to give our family it's own shared networked AI station, accessible from any of our devices. Then I'll trust it with all of our personal, medical and legal docs. Seems to me the market is expanding, not contracting.

shnp · 2025-01-28T14:50:13 1738075813

Is deepseek’s architecture not scalable anymore? I can see in a couple months some company with access to GPUs create a much larger and better performing model based on the deepseek architecture.

NotYourLawyer · 2025-01-28T13:48:49 1738072129

NVDA could go to zero and he’d still be a multi billionaire. His worst nightmare isn’t very bad.

maupin · 2025-01-28T13:42:40 1738071760

> “We didn’t think China was ever going to compete in AI,” said Dion Hinchcliffe, vice president of the CIO practice at Futurum Research last week.

What? Oh, come on now.

iambateman · 2025-01-28T18:30:22 1738089022

Does DeepSeek make it more possible to use AMD hardware? What chips are possible with DS that weren’t possible before?

stogot · 2025-01-28T13:42:24 1738071744

Did they say which chips they used? And how many? I haven’t seen this in any reporting

lysace · 2025-01-28T13:46:49 1738072009

The DeepSeek V3 paper says they used 2.8 million Nvidia H800 hours in a cluster of 2048 of them.

mtkd · 2025-01-28T13:46:25 1738071985

Wait until some surprise new silicon suddenly appears

Refusing23 · 2025-01-28T13:22:25 1738070545

DeepSeek could represent a much larger amount of costumers to nvidia since now people can get an AI while spending less

ftfy

logical_proof · 2025-01-28T13:38:29 1738071509

Did you mean consumers? If you're going to ftfy to someone you should use spell check.