Could someone explain as to why DeepSeek is bad for Nvidia?
The demand for Nvidia GPUs should now go up. Now anyone can run a GPT-like model by themselves. It's a prime time for businesses to start investing and setting up on-prem infra for that. I know some have been avoiding ChatGPT due to legal concerns and sensitive data.
People here quoting Jevons paradox are grossly misapplying it. Jevons paradox will predict an increase in usage of compute for AI. Not Nvidia specifically. The specific insane valuation of Nvidia comes from an unending stream of hundreds in billions in big tech capex, constantly buying cutting edge GPUs, because on SV scaling laws are practically religious commandments. It remains to be seen how this affects Nvidia, but it is not a case of Nvidia like many posts here imply, business wise.
These are all NVDA owners coping, they dont want rational thoughts.
The hit in market cap is absolutely deserved, especially now with the threat of tarrifs.
NVDA has an exceptionally reasonable P/E ratio and is a great pick on fundamentals alone. The tariffs are not going to make other graphics cards + their software 2x better.
pure cope. Hyperscalers already have a ton of gpus who just got 90% more efficient. They will absolutely reduce spend especially once the tariffs hit lol
from non-investor perspective it's a long awaited sign of eventual profitability in the money pit and 10x sales potential. Not trying to debunk you, to me it's just proof as to how unrelated tech and market worlds are
except that deepseek r1 isnt a gigantic leap in terms of actual intelligence. These models still have questionable use, I have no doubt that more people will want to use them now but its not going to compensate for how much more efficient they are now. Not even close.
Do you think that the data centers which have already been built are sitting idle, and that's why people are tripping over themselves to spin up new ones?
It's an objective fact that many people find these things useful. Whether or not they're willing to PAY is another issue.
I don't understand why it's bad for Nvidia either.
The fact that DeepSeek-R1 is so much better than DeepSeek-V3 at various important tasks means that Chain-of-though / thinking-before-answering models are better. But they are also more compute intensive at inference time than their instruction non-thinking counterparts.
So even if the DeepSeek-V3 pretraining + GRPO COT post-training procedure was cheaper than anticipated to reach o1 grade performance, inference is still costly, even if you use a distilled model.
Deepseek offers API pricing directly on their website, so it's pretty easy to compare inference costs indirectly: It's $60.00 vs. $2.19 for 1M output tokens. Openai is 27x as expensive.
If I can write an email from my small and cheap phone why would I buy that big mainframe? The chips to do the job become cheap. The high end chips will become a niche again for research and mil. stuff.
But if both the mainframe and personal computer are from IBM, is that scenario really bad for them?
If there was serious competition for cards to do GPU training that might spell trouble for Nvidia. But so far we only went from needing an absurdly huge cluster of Nvidia cards to a huge cluster of Nvidia cards, with Nvidia having a near-monopoly on cards used in deep-learning training all the way down to networks that fit on single cards.
Roughly the same, as our collective goalposts have shifted, and are still shifting.
What was amazing output last year is today's slop.
NV is in no immediate danger; medium/long term, to anyone knowing what "75% gross margin" means it's no secret that there will be serious threats. You don't need DeepSeek for that particular realization.
This is definitely one view, but not sure I buy it at all.
History is littered with things that were made orders of magnitude better, people who claimed nobody would be happy until that happened, and then nobody writ large cared .
Tech in particular has a grand history of claiming we need to produce the technically best thing, but in practice, the market almost always chooses somewhere in the range of useful products, which is not the same at all - the technically best thing has rarely, if ever, become the market leader.
I still don't get why this is bad for NVIDIA either. If anything, it brings more people into their ecosystem. NVIDIA is the AI hardware/platform company right now and people will deploy AI for more tasks if it becomes cheaper to do so.
But there's a more subtle point here which I don't see a lot of people talking about, maybe because they know more about this than me. Why wouldn't frontier model developers take DeepSeek R1 techniques and make their models even better or even larger? Or another way to ask: Are the DeepSeek R1 innovations only for making models cheaper (and slightly worse) or can the algorithms developed by the DeepSeek team be scaled up to make more powerful frontier models? Leading edge model developers don't just care about cost, they also want to release models with maximum capabilities. And as we've seen over the past few years, they're willing to pay almost anything to achieve this.
I think most AI researchers know there are still many things to explore in this space so disruptive innovations shouldn't be seen as a bubble burst but as more opportunity.
NVIDIA is in an extremely strong position right now. Even if someone has a major breakthrough on the hardware design side that dramatically lowers the cost of compute for AI workloads (which is highly unlikely), NVIDIA will just create their own implementation that will outperform the original since they have a stranglehold on an entire stack-up of technology: circuits, drivers, libraries and software.
You’re absolutely right. If there are corrections, it’s only because people have gone a little crazy. People sometimes get enamored with the idea of one country or company dominating in a winner-takes-all situation. OpenAI has been trying to spin narratives in which the only rational move is for everyone to invest all of their money in OpenAI, immediately.
I don’t think the Nvidia CEO is losing any sleep. He doesn’t have a planet-sized ego, like some. They are still selling pickaxes during a gold rush, as the saying goes. And the stock is still up 100% in the past year. CEOs don’t control the stock price (much as some try), anyway, the market does. Also, the market is insane. You also can’t control the competition, or what new innovations come along. You just have to run a good business, which they are doing.
Innovation is good for all players. Reality-checks are good. Nothing here is unexpected. There are lots of smart researchers in the world. Software innovations that make better use of hardware are expected. People just like drama.
Like if Tim does a better job at a skateboard trick that has always been Tom’s thing, people want to be that kid who is the one to say, “Oh, snap!!” and won’t stop talking about it at school, because they were there. And how it’s so mind-blowing and previously inconceivable.
Inference was already cheapish - businesses already can run gpt-like models by themselves. NVIDIA sales are not driven by inference spending, but training spend.
While sure, maybe now people spend more on inference, NVIDIA was not an investment based money on selling GPUs for inference, because it requires so much fewer resources.
Also, inference is easier to get disrupted in just because of how it works.
All told, this makes it a much less likely play.
Finally, the assumption that it goes up assumes people want on-prem infrastructure for this (in your case). Maybe true, maybe not.
Overall, going from being a "sure" thing selling 200m worth of clusters to random companies hand over fist to not necessarily being able to do that is definitely "bad'.
There are other common claims about why it will increase demand, and they rely on different assumptions (that you won't hit a good enough point quickly, etc)
Is anyone canceling their orders? Has anyone announced that the efficiency savings from Deepseek's innovations mean we will have superhuman AGI by the end of the year using existing hardware? Or that existing hardware fully covers all of the expected training demand for the next year?
Or will all of the efficiency savings get immediately absorbed by the demand for better performance and feed the demand for inference?
They don't exactly release this data, as you know, which is why it's so volatile - nobody has any real idea, just best guesses, and lots of different smart people have lots of different opinions..
But i agree in practice that over time nvidia's path now depends heavily on the answer to "Or will all of the efficiency savings get immediately absorbed by the demand for better performance and feed the demand for inference?"
Before, their next 5-10 years did not really depend on meaningful efficiency savings existing and getting them absorbed - nobody expected meaningful efficiency savings.
There have been continuous efficiency savings innovations for both training and inference over the last few years. Discovering and absorbing those gains is a normal part of the development process. It occurs in every industry and is facilitated in computer science by mechanisms like open source and published papers.
Those savings don't usually make national news because they aren't the target. They aren't "man on the moon" moments, they are "our cannonball flies 96% as far as yours using 30% less powder" moments.
This is still a race for tools that essentially brute force their way to greater utility. Until we hit a utility quotient or hit a wall, the race doesn't really end. The quickest and easiest path to advantage in the race is still more compute.
"There have been continuous efficiency savings innovations for both training and inference over the last few years."
I'm aware - i've run the relevant teams at large scale companies like Google :)
Is your claim anyone has made 98% total vs, say, 5 years ago?
If so, evidence please?
Otherwise, seems sort of irrelevant.
AFAIK, nobody has come anywhere close to this, and the only possible way they would is through externalizing roughly all costs onto someone else.
(IE certainly google TPU's are much more efficient than 5 years ago, but since you have to account for development cost, they are not saving 98% vs 5 years ago)
Nvidia was and still is overpriced. Yes, AI is now on the mainframe stage. Soon it will go to consumer devices with a billion of potential users and the training and the required hardware for it will be very different: much less power consumption, continuous training etc, just like our or animal brain. Consumers did buy computers for $1000 in 1990s. They may buy a robot for $10000 tomorrow. The benchmark for the market cap in this segment are car manufacturers or Intel in 1990s.
i believe the article is making the case that the market has already priced in the expectation that training (but especially training which is what NVIDIA is useful for) and inference will be continually more expensive and deepseek implies that the hardware required to train and infer really good models has a lower ceiling.
Mostly it just means that we’ve got a new tool we can leverage which is a positive feedback for improving models, and will ultimately increase both train and inference demand and shorten timescales for deeper capabilities.
Absolutely agree on this.
1. DeepSeek just inspired a LOT of startups to develop their own as you no longer need to be a tech-giant to compete on training.
2. Companies with sensitive info will now buy their own GPUs to run their models locally as the range of application increased (as it did with Llama3)
3. As with 2, new services that were prohibitive with ChatGPT API will spring up. It’s difficult to reliably rent GPUs for services even with enough money, so people will buy more GPUs to host it themselves.
I understand if new CPU/GPU can outperform Nvidia and DeepSeek was developed using another GPU, but this is not the case. Lower requirement for higher performance historically never reduced the need for computational capacity. There is very poor reasoning for this move other than purely “technical” (trading-wise) reasons.
Agreed, I believe AI/LLMs will create induced demand for compute long-term, even if we get more efficient models (long-term meaning >6mo). Markets don't look ahead for more than a quarter or two though, so it makes sense there would be a correction based on the news.
It is not bad for Nvidia. It is bad for hype. Nvda position was good for being a company selling chips with lots of power. But now that not much power is needed, reliance on nvidia is less. And anyone with simpler processors has capabilities to sell AI chips.
Nvidia will still sell many chips. It just won’t be the only one capable of selling them. The hype is gone. The moat is gone. The excitement and enthusiasm from investors is gone. This is called a correction.
I think it's also generally understood that Nvidia owns the training space, but not the inference space. They have a lot more competition there and the margins are smaller. More people running AI models is still good for them, but a drop in the bucket compared to the money they were making from training clusters being built out. And I think everyone just realized they can probably make do with their existing clusters.
Question really is does the demand go up at current prices of units? Thus do current profit levels keep.
If the training was too expensive previously is it now cheap enough or does it need to be even cheaper? As if it weren't too expensive everyone who were going to invest did. And if it is still too expensive, well prices have to go lower or more efficiency has to be found.
I am not entirely sure if there is huge amount of unmet demand for training.
That assumes demand remains constant. Maybe lowering the overall cost to train a model will mean that more people want to train models, thus raising the demand.
There are a lot of folks lining up to use a smart model. This takes tokens. I’m not convinced nvidia is blown up by this news at all. The trade has become less crowded, that is true.
if things become too efficient then you can use commodity compute and don't need GPUs at all. I'm not sure why you would think breakthroughs in efficiency would be good for nvidia. eventually a regular mac or PC will be able to do what you need a H100 to do now. that won't be good for nvidia.
My bet is there's no such thing as 'too efficient' in this space. If you can get a very good model on a small device, it's going to be totally amazing on a huge GPU.
I bet you're wrong - there are already massive diminishing returns in the best models from 2024 vs 2023. This idea that you can just through more compute and it scales with performance is fiction. You do get more performance with more compute, but it doesn't scale, and it's a waste of money, as shown with deepseek.
this conversation reminds me of people when the PS2 came out saying that by 2010 games would look literally better than real life, because they thought graphics quality would exponentially improve...
I would agree except I think it’s 1997. I’ve gotten DeepSeek to mostly solve a not very complex problem in a somewhat obscure domain (home assistant automations) in 214 seconds of its own ‘thinking’ and if you can get this two orders of magnitude lower this unlocks completely new use cases ie. demand.
The market is already losing patience on "profitable AI". Deepseek only improves cost with worse performance, while even chatgpt is still far from profitable.
The US has prohibited nvidia from exporting to China, so DeepSeek (the company) won’t be a customer, and others wanting to self-host their models have the freedom to pick any hardware vendor.
This thing underlines how nvidia is in trouble; it proves (unless Deepseek used servers outside of china) that banning nvidia from china does not give them any kind of advantage anymore.
What happens with excess hardware that’s been already bought? It could take years for all of that to be absorbed by smaller companies. And secondly, if this opens the era of self-training models then why go for a 671B one and not a smaller that’s fine-tuned to a company’s specific needs and can then run on consumer GPUs. At 70B with 8-bit quantization you’re good with just three 3090s.
Deepseek isn't AGI yet and really still has a long way to go.
Na, we have a long way to go with models, especially when you start adding different modes. We'll still need a metric shitload of compute for a long time.
I don't for a minute believe Deepseek v3 was built with a $6M rental.
Their paper (arxiv 2412:1947) explains they used 2048 H800s. A computer cluster based on 2048 GPUs would have cost around $400M about two years ago when they built it. (Give or take, feel free to post corrections.)
The point is they got it done cheaper than OpenAI/Google/Meta/... etc.
But not cheaply.
I believe the markets are overreacting. Time to buy (tinfa).
They pointed out that the cost calculation is based on if those GPUs were rented at $2/hr. They are not factoring in the prior cost of buying those H800s because they didn't buy it to build R1. They are not factoring in the cost to build v2, or v2.5. The cost is to build V3. The cost to build R0 and R1 on top of v3, seems far cheaper and they didn't mention that. They are not factoring in the cost to build out their datacenter or salary. Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.
"Assuming the rental price of the H800 GPU is $2 per GPU hour, our total training costs amount to only $5.576M. Note that the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data."
V3 was released a bit a month ago, V3 is not what took the world by storm but R1. The price everyone is talking about is the price for V3.
If this weren't an attempt to sell a false equivalency, at least one story would have details on the equivalent rental cost of compute used to train closed source frontier models from OpenAI, Anthropic, Mistral... Lack of clarity makes it a story.
>>Just the training cost. They made it clear. If you could rent equivalent GPUs at $2/hr, it would cost you about $6million.
This is still quite impressive, given most people are likely to buy cloud infrastructure from AWS or Azure than build their own datacenter. So the Math checks out.
I don't think compute capacity built already will go waste, likely more and bigger things will get built in the coming years so most of it will be used for that purpose.
You’re confusing the metric for reality. The point is to compare the cost of training in terms of node hours with a given configuration. That’s how you get apples to apples. Of course it doesn’t cover building the cluster, housing the machine, the cleaning staff’s pension, or whatever.
The math they gave was 2,788,000 H800 GPU hours[1], with a rental price of $2/GPU-hour[1], which works out to $5.6M. If they did that on a cluster of 2048 H800s, then they could re-train the model every ~1400 hours (~2 months).
If they paid $70,000 per GPU[2] plus $5000 per 4-GPU compute node (random guess), then the hardware would have cost about $150M to build. If you add in network hardware and other data-centery-things, I could see it reaching into the $200M range. IMO $400M might be a bit of a stretch but not too wildly off base.
To reach parity with the rental price, they would have needed to re-train 70 times (i.e. over 12 years). They obviously did not do that, so I agree it's a bit unfair to cost this based on $2M in GPU rentals. Why did they buy instead of rent? Probably because it's not actually that cheap to get 2048 concurrent high-performance connected GPUs for 60 days. Or maybe just because they had cash for capex.
Something like an H100 is definitely a feat of engineering, though.
Nothing prevents Cooler Master from releasing a line of GPUs equally performant and, while at it, even cheaper. But when we measure reality, after the wave function of fentanyl and good intentions collapses ... oh yeah, turns out only nVidia is making those chips, whoops ...
Looking around a bit the price was ~70k$USD _in China_ around the time they where released in 2023, cheaper bulk sells where a thing, too.
Note that this are the China prices with high markup due to export controls etc.
The price of a H800 80GiB in the US is today more like ~32k$USD .
But for using H800 clusters well you also need as fast as possible interconnects, enough motherboards, enough fast storage, cooling, building, interruption free power etc. So the cost of building a "H800" focused Datacenter is much much higher then multiplying GPU cost by number.
You can’t buy the gpus individually, and even if you can on a secondary market, you can’t use them without the baseboard, and you can’t use the baseboard without a compatible chassis, and a compatible chassis is full of CPUs, system memory etc. on top of that, you need a fabric. Even if you cheap out and go RoCE over IB it’s still 400gbs hcas, optics and switches
Yea, a node in a cluster costs as much as an American house. Maybe not on its own, but to make it useful for large scale training, even under the new math of deepseek, it costs as much as a house.
They estimated $200k for a single NVIDIA GPU-based CPU complete with RAM and networking. That's where my number came from. (RAM and especially very-high-speed networking is very expensive at these scales.)
"Add it all up, and the average selling price of an Nvidia GPU accelerated system, no matter where it came from, was just under $180,000, the average server SXM-style, NVLink-capable GPU sold for just over $19,000 (assuming the GPUs represented around 85 percent of the cost of the machine)"
That implies they assumed an 8-GPU system. (8 × $19,000 = $152,000 ≈ 85% × $180,000)
To clarify, a legitimate benchmark for training is to calculate the running cost, not capex cost. Because obviously the latter would drop dramatically with the number of models you train. But to put into context, Meta wants to spend 50B on AI this year alone. And it already has 150x the compute of DS. The very real math going through investors head is - what's stopping Zuck from taking 10B of that and mailing a 100 million signing bonus to every name on the R1 paper?
The $6M that is thrown around is from the DS V3 paper and is for the cost of a single training run for DeepSeek V3 - the base model that R1 is built on.
The number does not include cost for personell, experiments, data preparation, chasing dead ends, and most importantly, it does not include the reinforcement learning step that made R1 good.
Furthermore, it is not factored in that both R3 and V1 are build on top of an enormous amount of synthetic data the was generated by other LLMs.
Comparing cost of buying with cost of running is weird. It's not like they build a new cluster, train just this one model, and then incinerate everything.
They bought between 10k and 50k of them before the US restrictions came into place. Sounds like DeepSeek gets to use them for training, as they were profitable (could still be, not sure).
Electricity in China, even at residential rates, is 1/10th the cost it is in CA.
I think the salient point here is that the "price to train" a model is a flashy number that's difficult to evaluate out of context. American companies list the public cloud price to make it seem expensive; Deepseek has an incentive to make it sound cheap.
The real conclusion is that world-class models can now be trained even if you're banned from buying Nvidia cards (because they've already proliferated), and that open-source has won over the big tech dream of gatekeeping the technology.
Over the last few days people have asked me if they think NVIDIA is fkd.. It still takes two H100s to run inference on the DS v3 671b @ <200 tokens per second.
There are different versions of the model as well as using it with different levels of quantization.
Some variants of DeepSeek-R1 can be run on 2x H100 GPUs, and some people managed to get still quite decent results with a even stronger distilled mode running it on consumer hardware.
For DeepSeek-V3 even with 4bit quantization you need more like 16x H100.
All this media frenzy around DS V1 makes me feel sick to my stomach.
It increased the noise in the AI space by orders of magnitude. Every media outlet is bombarding you with a relentless torrent of half-true information, exaggered interpretation of single facts and speculation.
Sometimes I wonder whether this is amplified by a state actor?
Also curious, how Minimax-01, which is also an excellent model with impressive improvements, went by completely unnocited.
The only good thing is that this certainly put an end to OpenAIs price gauging - pretty sure that $200/month individual plan is not the limit of their imagination.
It can't get sillier than this. All that DeepSeek did were showing that NVIDIA products hold more value than believed, and that US dominance more illusory than thought. It has zero negative bearing on eventual path to AGI.
And that's somehow supposed to be end of NVIDIA? Hello?
If we look at the processing requirements in nature, I think that the main trend in AI going forward is going to be doing more with less, not doing less with more, as the current scaling is going.
Thermodynamic neural networks may also basically turn everything on its ear, especially if we figure out how to scale them like NAND flash.
If anything, I would estimate that this is a space-race type effort to “win” the AI “wars”. In the short term, it might work. In the long term, it’s probably going to result in a massive glut in accelerated data center capacity.
The trend of technology is towards converging with or doing better than natural processes, not doing it 100000x less efficiently. I don’t think AI will be an exception.
If we look at what is -theoretically- possible using thermodynamic wells, with current model architectures, for instance, we could (theoretically) make a network that applies 1t parameters in something like 1cm2. It would use about 20watts, back of the napkin, and be able to generate a few thousand T/S.
Operational thermodynamic wells have already been demonstrated en silica. There are scaling challenges, cooling requirements, etc but AFAIK no theoretical roadblocks to scaling.
Obviously, the theoretical doesn’t translate to results, but it does correlate strongly with the trend.
So the real question is, what can we build that can only be done if there are hundreds of millions of NVIDIA GPUs sitting around idle in a few years? Or alternatively, if those systems are depreciated and available on secondary markets?
Also evidenced by 3B and 7B models becoming ever more capable, gaining qualities that were previously only achieved by models a magnitude larger.
Bigger models are more capable, but smaller models can be iterated on faster. For a couple months now most of the impressive achievements have been in increasingly smaller and cheaper models. Deepseek just has the perfect storm of impressive results, accessibility and international rivalry that made it go viral
It's actually the beginning of test time scaling. R1 has shown that a very simple reinforcement learning scheme can be used to teach the model how to think in a chain-of-though as an emergent property.
No addition pretraining data needed! Only more compute.
There was no market reaction when Deepseek r1 was released.
There was no market reaction when Deepseek v3 was released.
There was a massive market reaction when a Deepseek app was released.
There is too much damn irrationality in the AI and AI adjacent market right now, and pop articles like this are only making it worse.
Jevons Paradox absolutely holds for model development, and Deepseek should (and is - all my cybersecurity startup peers are investigating the feasibility of building their own models now) be viewed as an opportunity for smaller or medium stage companies to build their own competitive domain specific models, while not having to pay what is essentially protection money to OpenAI or Anthropic.
The only people at risk with the democratization of model development are the investors in foundational model companies
---------
Also, I hope Deepseek FINALLY reprioritizes distributed systems education in CS.
There are too many "ML Engineers" who do not know the basics of OS or Hardware Engineering (eg. Cannot optimize Mellanox hardware, really dig into workload optimization, etc) and are thus burning resources inefficiently. Most MLEs I meet now just import a package and glue code together WITHOUT also understanding the performance capabilities that their compute has.
Most universities in the US either don't require OS or CompArch classes for CS majors (eg. Harvard) or dumb them down significantly (eg. Not going deep into Linux kernel implementation, scheduling, etc).
This is why the entire cybersecurity industry has largely shifted to Israel and India, and the skillset overlaps significantly with MLOps and MLEng (if you know how to implement spinlocks in C you can easily be taught CUDA)
Jevons paradox has been quoted to death here - it really needs pointing out that it points to a commodity in general, not Nvidia. Nvidia is not selling just coal, it's selling a super purified ultra powerful coal only it can make. Jevons paradox just says coal (compute) usage will increase. Not that this specific type of coal will.
And this is quite separate from the business of it. For example Jevons paradox is quite apparent in the airline industry, but airlines are notorious for possibly never having net made a profit because of high capex and aggressive pricing competition. Jevons paradox is not any reason at all for a runaway valuation of a company.
I agree, but the existing moat that Nvidia has due to CUDA and the entire HPC ecosystem still holds. And this is why it's relevant to treat Nvidia as the majority of the GPU space segment.
Furthermore, it's Nvidia that owns Mellanox - which is the owner of the Infiniband IP which is used in just about every DC or cluster, because no other vendor came as close for interconnect performance
You're on the dot here. Most people are looking at it wrong. Someone here once put it succinctly - Nvidia isn't selling GPUs, it's selling Infiniband clusters. The whole NVLink/CUDA ecosystem is the key, and a mass of separate players (Cerebras, Google TPUs for inference, Mojo for CUDA, AMD/Intel/Huawei etc for GPUs, etc) are very unlikely to break that. In fact NVLink is so powerful, based on the DeepSeek paper it appears to be nerfed (classic Nvidia market segmentation strategy), and a big part of squeezing their compute was hacking around that.
That said, the reason I don't invest Nvidia is that they're selling one more thing - AI legitimacy. Every tech shop is buying Nvidia because all their CEOs need to go to the board and say, we've got so and so AI strategy. Sure as hell Meta isn't spending 50 billion a year because AI makes them so much damn money (supposedly ML saved their ass when Apple put on App Tracking Transparency, but 99% suds it's not the ML that needs a gigawatt GPU cluster to run). And the very real calculus for investors/CEOs is that if DeepSeek made this on a thousandth of the budget (conflicting reports but high probably DS has a fraction of the compute), whYs stopping Zuckerberg from taking 10B of that capex spend and offering every name on that DeepSeek paper a 10 million salary?
How about the more charitable hypothesis: It's a relevant dynamic because this is a situation where the paradox most strongly applies: efficiency improvement in use of an input whose output has insatiable demand. And so people bring it up, even if they just recently learned it from another comment.
It's okay to say something true ... even if someone else already said it elsewhere.
I never said it was wrong, I'm just saying a lot of people became instant experts on topics they had no idea even existed 24 hours ago, including people with grand techno prophecies and people giving financial advices
If someone says something correct that they just learned, they don't deserve ridicule just because they learned it recently and others are saying it too.
Correct me if I'm wrong, but Jevons paradox says nothing about profits from selling coal, the growth rates of coal consumption, etc.
Nvidia currently has ~75% margins and ~90% growth. If either of those nobs get turned down slightly, their valuation can tank.
Which is what happened.
If Nvidia's 10-year growth forecast went from ~40% per year (29x in 10 years) to ~30% per year (14x in 10 years) - that's a 50% smaller future company you're expecting.
It's still great for Nvidia. It's just LESS great than people thought before, which means their market cap comes down a ton.
Nvidia had 65% margins 5 years ago when they primarily sold gaming GPUs.
What margins do you think Nvidia is charging for RTX which getting more and more expensive every generation?
What margins do you think Project Digits will be for Nvidia? That thing costs 50% more than a RTX card and on the RTX card Nvidia only earns on the chip which is only part of the card. Digits is a 100% Nvidia product.
Does it matter if 1 Blackwell DC GPU or 13 Digits are sold? No, Nvidia has probably an insane margin on Digits and has only one goal, total spread and integration of Nvidia HW in any AI workload. Because then Nvidia can offer SW with super margins.
Think of Nvidia Enterprise AI, Omniverse, Clara, Isaac, DriveSim and many more. If 95% of AI acclerators are from Nvidia then it's only a small step to also use Nvidia SW frameworks.
Everything I know about ML is self taught. My local Uni has a ML Engineering degree that's relatively new. Every once in a while I look at the curriculum, outside of some higher lvl math, 90% of that four year degree consists of topics I "mastered" in the first 6-12 months of self study trying to build prediction models for various sports.
If I could go back I would start by reading Josh Starmer's Statquest Guide to Machine Learning, and then his guide to AI/Nueral Networks[0]. Starmer does the best job at explaining advanced ML topics in a very beginner friendly way, the books are literally written in the format of a children's book.
Then just start tinkering. I got interested in ML because of sport's analytics and betting markets so I read a lot of papers on that topic and books similar to Bayesian Sports Models in R by Andrew Mack[1].Also, Jake VanderPlas's Python Data Science Handbook is good[2].
Ideally, find a vertical you're interested in where experts have applied ML and read their papers/books and work backwards from there.
Do you know how to optimize the MTU of an Infiniband interconnect? Do you understand how to schedule multiple models being trained concurrently? Do you know why you cannot directly leverage bare metal compute within a Docker image?
This is important Infra knowledge you need to take full advantage of any model you are training on your own hardware. And this is why Deepseek was successful - they understood the ins-and-outs of systems programming and the H800 architecture to maximize the compute performance they needed to train their model.
What about Linux namespaces makes you think it's not running on bare metal? Maybe you are thinking of versions of docker that introduce a Linux hyper visor like macos?
I don't think the part about the market reaction is too surprising, for COVID it took a while for the market to react as well.
Seeing that DeepSeek-R1 has benchmark scores like o1 is not the same as seeing that people actually like it, that it's being adopted, and that stated training costs are getting accepted as credible.
(But I agree that there is a lot of irrationality in these markets.)
Or people in charge cannot properly evaluate things :)
It is very possible, when things require advanced technical knowledge and experience
><"Old Man yelling at clouds" rant over>
They are too busy deploying ruby apps and juggling jsons over https to mess with ugly cpp, segfaults, kernel details, hardware intrinsics and semiconductors
At the scale that the selloffs happened, that meant there were some shifts by institutional investors. These are the kinds of managers that would have technical people on retainer to consult (eg. Guidepoint).
> They are too busy deploying ruby apps and juggling jsons over https
Ruby is "legacy" as well nowadays. Most younger devs I meet tend to really only understand JS and Python, and that too while heavily relying on outside packages or dependencies.
Not a bad thing per say, but if a VC like me has deeper knowledge about OS internals or Mellanox tuning than some of the (American) MLEs in companies they've done due diligence on, something's very wrong with the talent pipeline.
Humans are generally not very good at properly evaluating things in general. They are exceptionally good at believing themselves to be very good at properly evaluating things though.
It is why empires collapse from miscalculations about war by their experts, it is the foundational mental model underlying communism that simply cannot work but keeps being attempted, and it is how countries can be destroyed in every which way while the experts maintain that everything is just fine.
Never underestimate the powers of the ego, the destroyer of worlds. I mean they don’t know what they are doing, but we have advanced technical knowledge and experience, so we should clearly be in charge of all things, including those beyond our narrow scope of advanced technical knowledge and experience.
To be fair, capitalism doesn't really work either, you get a very unstable systems with cycles (which people try to stabilize in ad hoc ways), you optimize for the wrong things long term (right now producing much more goods than we need, without factoring in devastating costs that climate change will bring in a few decades, and destruction of nature and eradication of species) and when things crash many people are hurt. And it practically needs wars and destruction to reset wealth distribution every 50 to 100 years.
No communism doesn't work either. I wish there was more alternatives.
And some of the usual examples of "socialism doesn't work" might have worked if US hadn't interfered. E.g. in Chile, no system would have survived US actively supporting Pinochet. We will never know how socialism would have played out there if US (and Soviet too) left it alone.
R1 and O1 points towards a world where training models will be a small bucket of the overall compute. That doesn't mean the total amount of AI compute will stop accelerating, just that interconnected mega clusters is not the only or most efficient way to run a majority of future workloads. That should be negative news for the company that is currently the only one that is capable of making chips for these clusters, and positive news for the players that can run inference on a single chip, as they will be able to grab more parts of the compute pie.
Was the US sanctions on chip selling to China worth it? Yesssss. It was. It still is. (But I frown on this action.)
China is good on dumb stupid manufacturing. It does this with so much enthusiasm HNers are left scratching their heads why the excitement. The possibility of China manufacturing the next computing powerhouse was too high. Higher than it could ever be in the US.
The threat to US leadership was that the fastest supercomputer for AI in the world could be in China (the first sputnik moment per Barack Obama). So the chip embargo was to prevent that from happening.
So China did not spend/waste $$ & time building hardware. They spent it on software. Normally the Chinese version is cheap enough and good enough. But this time it was cheap enough and surprisingly terrific enough. Props to Liang the founder for having that game face on.
Its not smuggling if Nvidia vendors openly sell it to them :) I'm pretty sure most of Russia's military equipment is filled with Texas Instrument MCs. These export controls don't work.
I find it hard to believe that serious investors thought that vertical progress in ai had peaked, and it would be only horizontal expansion from now on. We know that it takes decades for this to happen in every industry.
It seems more like , investors were looking for a significant event to pop the obvious bubble if the last year
Which investors, specifically, were looking for a significant event to pop the obvious bubble?
Investors who were actually invested, or onlookers with FOMO who hadn't invested yet?
While I think the "AI" companies invested in are overpriced*, I think the market as a whole hasn't even begun to price in the changes in store across non-tech sectors and industries even if LLMs were to advance no further than now, with all additional progress at the prompting and agentic levels.
* Even here, Jevons paradox applies. So what's being called "priced for perfection" may actually be priced for business as usual with multiplied demand.
Time to give our family it's own shared networked AI station, accessible from any of our devices. Then I'll trust it with all of our personal, medical and legal docs. Seems to me the market is expanding, not contracting.
Is deepseek’s architecture not scalable anymore? I can see in a couple months some company with access to GPUs create a much larger and better performing model based on the deepseek architecture.
The demand for Nvidia GPUs should now go up. Now anyone can run a GPT-like model by themselves. It's a prime time for businesses to start investing and setting up on-prem infra for that. I know some have been avoiding ChatGPT due to legal concerns and sensitive data.