Testing AMD's Giant MI300X

w-m · on June 25, 2024

Impressions from last week’s CVPR, a conference with 12k attendees on computer vision - Pretty much everyone is using NVIDIA GPUs, and pretty much everyone isn’t happy with the prices, and would like some competition in the space:

NVIDIA was there with 57 papers, a website dedicated to their research presented at the conference, a full day tutorial on accelerating deep learning, and ever present with shirts and backpacks in the corridors and at poster presentations.

AMD had a booth at the expo part, where they were raffling off some GPUs. I went up to them to ask what framework I should look into, when writing kernels (ideally from Python) for GPGPU. They referred me to the “technical guy”, who it turns out had a demo on inference on an LLM. Which he couldn’t show me, as the laptop with the APU had crashed and wouldn’t reboot. He didn’t know about writing kernels, but told me there was a compiler guy who might be able to help, but he wasn’t to be found at that moment, and I couldn’t find him when returning to the booth later.

I’m not at all happy with this situation. As long as AMDs investment into software and evangelism remains at ~$0, I don’t see how any hardware they put out will make a difference. And you’ll continue to hear people walking away from their booth, saying “oh when I win it I’m going to sell it to buy myself an NVIDIA GPU”.

sangnoir · on June 25, 2024

> I’m not at all happy with this situation. As long as AMDs investment into software and evangelism remains at ~$0, I don’t see how any hardware they put out will make a difference.

It appears AMD initial strategy is courting the HPC crowd and hyperscalers, they have big budgets, lower support overhead and are willing and able to write code that papers-over AMDs not-great software while appreciating lower-than-Nvidia TCO. I think this this incremental strategy is sensible, considering where most of the money is.

As a first mover, Nvidia had to start from the bottom up; CUDA used to run only/mostly on consumer GPUs - AMD is going top-down, starting with high-margin DC hardware, before trickling down rack-level users, and eventually APUs later as revenue growth allows more re-investment.

landryraccoon · on June 25, 2024

They’re making the wrong strategic play.

They will fail if they go after the highest margin customers. Nvidia has every advantage and every motivation to keep those customers. They would need a trillion dollars in capital to have a chance imho.

It would be like trying to go after Intel in the early 2000s by trying to target server cpus, or going after the desktop operating system market in the 90s against Microsoft. Its aiming for your competition where they are strongest and you are weakest.

Their only chance to disrupt is to try to get some of the customers that Nvidia doesn’t care about, like consumer level inference / academic or hobbyist models. Intel failed when they got beaten in a market they didn’t care about, i.e mobile / small power devices.

Certhas · on June 25, 2024

This is a common sentiment, no doubt also driven by the wish thay AMD would cater to us.

But I see no evidence that the strategy is wrong or failing. AMD is already powering a massive and rapidly growing share of Top 500 HPC:

https://www.top500.org/statistics/treemaps/

AMD compute growth isn't in places where people see it, and I think that gives a wrong impression. (Or it means people have missed the big shifts over the last two years.)

jampekka · on June 25, 2024

It would be interesting to see how much these "supercomputers" are actually used, and what parts of them are used.

I use my university's "supercomputer" every now and then when I need lots of VRAM, and there are rarely many other users. E.g. I've never had to queue for a GPU even though I use only the top model, which probably should be the most utilized.

Also, I'd guess there can be nvidia cards in the grid even if "the computer" is AMD.

Of course it doesn't matter for AMD whether the compute is actually used or not as long as it's bought, but lots of theoretical AMD flops standing somewhere doesn't necessarily mean AMD is used much for compute.

roenxi · on June 25, 2024

It is a pretty safe bet that if someone builds a supercomputer there is a business case for it. Spending big on compute then leaving it idle is terrible economics. I agree with Certhas in that although this is not a consumer-first strategy it might be working. AMDs management are not incapable, for all that they've been outmanoeuvred convincingly by Nvidia.

That being said, there is a certain irony and schadenfreude in the AMD laptop being bricked from the thread root. The AMD engineers are at least aware that running a compute demo is an uncomfortable experience on their products. The consumer situation is not acceptable even if strategically AMD is doing OK.

jampekka · on June 26, 2024

I find it a safer bet that there are terrible economics all over. Especially when the buyers are not the users, as is usually the case with supercomputers (just like with all "enterprise" stuff).

In the cluster I'm using there's 36 nodes, of which 13 are currently not idling (doesn't mean they are computing). There are 8 V100 GPUs and 7 A100 GPUs and all are idling. Admittedly it's holiday season and 3AM here, but this it's similar other times too.

This is of course great for me, but I think the safer bet is that the typical load average of a "supercomputer" is under 0.10. And the less useful the hardware, the less will be its load.

jfkfif · on June 26, 2024

It is not a reasonable assumption to compare your local cluster to the largest clusters within DOE or their equivalents in Europe/Japan. These machines regularly run at >90% utilization and you will not be given an allocation if you can’t prove that you’ll actually use the machine.

I do see the phenomenon you describe on smaller university clusters, but these are not power users who know how to leverage HPC to the highest capacity. People in DOE spend their careers working to use as much as these machines as efficiently as possible.

Certhas · on June 26, 2024

In Europe at least supercomputer are organised in tiers. Tier 0 are the highest grade, tier 3 are small local university clusters like the one you describe. Tier 2 or Tier 1 machines and upward usually require you to apply for time. They are definitely highly utilised. Tier 3 the situation will be very different from one university to the next. But you can be sure that funding bodies will look at utilisation before deciding on upgrades.

Also this amount of GPUs is not sufficient for competitive pure ML research groups from what I have seen. The point of these small decentral underutilized resources is to have slack for experimentation. Want to explore ML application with a master student in your (non-ML) field? Go for it.

Edit: No idea how much of the total hpc market is in the many small instalks, vs the fewer large ones. My instinct is that funders prefer to fund large centralised infrastructure, and getting smaller decentralised stuff done is always a battle. But that's all based on very local experience, and I couldn't guess how well this generalises.

throwaway2037 · on June 26, 2024

    > It is a pretty safe bet that if someone builds a supercomputer there is a business case for it.

As I understand, most (95%+) of the market for supercomputers is gov't. If wrong, please correct. Else, what do you mean by "business case"?

Certhas · on June 26, 2024

When you ask your funding agency for an HPC upgrade or a new machine, the first thing they will want from you are utilisation numbers of current infrastructure. The second thing they will ask is why you don't just apply for time on a bigger machine.

Despite the clichés, spending taxpayer money is really hard. In fact my impression is always that the fear that resources get misused is a major driver of the inefficient bureaucracies in government. If we were more tolerant of taxpayer money being wasted we could spend it more efficiently. But any individual instance of misuse can be weaponized by those who prefer for power to stay in the hands of the rich...

jampekka · on June 26, 2024

At least where I'm from, new HPC clusters aren't really asked for by the users, but they are "infrastructure projects" of their own.

With the difficulty of spending taxpayer money, I fully agree. I even think HPC clusters are a bit of a symptom of this. It's often really hard to buy a beefy enough workstation of your own that would fit the bill, or to just buy time from cloud services. Instead you have to faff with a HPC cluster and its bureaucracy, because it doesn't mean extra spending. And especially not doing a tender, which is the epitome of the inefficiency caused by the paranoia of wasted spending.

I've worked for large businesses, and it's a lot easier to spend in those for all sorts of useless stuff, at least when the times are good. When the times get bad, the (pointless) bureaucracy and red tape gets easily worse than in gov organizations.

bayindirh · on June 30, 2024

> At least where I'm from, new HPC clusters aren't really asked for by the users, but they are "infrastructure projects" of their own.

Because the users expect them to be renewed and improved. Otherwise the research can’t be done. None of our users tell us to buy new systems. But they cite us like mad, so we can buy systems every year.

The dynamics of this ecosystem is different.

bayindirh · on June 30, 2024

> It would be interesting to see how much these "supercomputers" are actually used, and what parts of them are used.

I’m in that ecosystem. Access is limited, demand is huge. There’s literal queues and breakneck competition to get time slots. Same for CPU and GPU partitions.

They generally run at ~95% utilization. Even our small cluster runs at 98%.

hoseja · on June 26, 2024

Did your university not have a bioinformatics department?

jampekka · on June 26, 2024

It does. And meteorology, climatology and cosmology for example.

hoseja · on June 26, 2024

Well then I'm really unsure what's happening. Any serious researcher in either of those fields should be able and trying to expand into all available supercompute.

jampekka · on June 26, 2024

Maybe they just don't need them? At least a bioinformatics/computational science professor I know runs most of his analyses on a laptop.

frognumber · on June 25, 2024

I see a lot of evidence, in the form of a rising moat for NVidia.

Jlagreen · on July 1, 2024

Super computers are in 95% cases government funded and I recommend that you check in conditions for tenders and how government has check on certain condition in buying. That isn't a normal business partner who only looks at performance, there are many more other criteria in the descision making.

Or let me ask you directly, can you name me one enterprise which would buy a super computer and wait 5+ years for it and fund the development of HW for it which doesn't exist yet? At the same time when the competition can deliver a super computer within the year with an existing product?

No sane CEO would have done Frontier or El Capitan. Such things work only with government funding where the government decides to wait and fund an alternative. But AMD is indeed a bit lucky that it happened or otherwise they wouldn't been forced to push the Instinct line.

In the commercial world, things work differently. There is always a TCO calculation. But one critical aspect since the 90s is SW. No matter how good the HW is, the opportunity costs in SW could force enterprises to use the inferior HW due to SW deployment. If vision computing SW in industry is supporting and optimized for CUDA or even runs only with CUDA then any competition has a very hard time penetrating that market. They first have to invest a lot of money to make their products equally appealing.

AMD makes a huge mistake and is by far not paranoid enough to see it. For 2 decades, AMD and Intel have been in a nice spot with PC and HPC computing requiring x86. It basically to this date has guaranteed a steady demand. But in that timeframe mobile computing has been lost to ARM. ML/AI doesn't require x86 as Nvidia demonstrates by combining their ARM CPUs into the mix but also ARM themselves want more and more of the PC and HPC computing cake. And MS is eager to help with OS for ARM solutions.

What that means is that if some day x86 isn't as dominant anymore and ARM becomes equally good then AMD/Intel will suddenly have more competition in CPUs and might even offer non-x86 solutions as well. Their position will therefore drop into yet another commodity CPU offering.

In the AI accelerator space we will witness something similiar. Nvidia has created a platform and earns tons of money with it by combining and optimizing SW+HW. Big Tech is great at SW but not yeat at HW. So the only logical thing to do is getting better at HW. All large Tech companies are working on their own accelerators and they will build their platform around it to compete with Nvidia and locking in customers all the same way. The primary losers in all of this will be HW only vendors without a platform, hoping that Big Tech will support them on their platforms. Amazon and Google have already shown today that they have no intention to support anything besides their platform and Nvidia (which they only must due to customer demand).

latchkey · on July 3, 2024

I am that crazy ceo building a super computer, for rent by anyone who wants it. We are starting small and growing with demand.

Our first deployment is 3x larger flops than Cheyenne and a fraction of the cost.

https://en.wikipedia.org/wiki/Cheyenne_(supercomputer)

bryanlarsen · on June 25, 2024

The savings are an order of magnitude different. Switching from Intel to AMD in a data center might have saved millions if you were lucky. Switching from NVidia to AMD might save the big LLM vendors billions.

energy123 · on June 25, 2024

Nvidia have less moat for inference workloads since inference is modular. AMD would be mistaken to go after training workloads but that's not what they're going after.

toast0 · on June 25, 2024

I only observe this market from the sidelines... but

They're able to get the high end customers, and this strategy works because they can sell the high end customers high end parts in volume without having to have a good software stack; at the high end, the customers are willing to put in the effort to make their code work on hardware that is better in dollars/watts/availability or whatever it is that's giving AMD inroads into the supercomputing market. They can't sell low end customers on GPU compute without having a stack that works, and somebody who has a small GPU compute workload may not be willing or able to adapt their software to make it work on an AMD card even if the AMD card would be a better choice if they could make it work.

jiggawatts · on June 25, 2024

They’re going to sell a billion dollars of GPUs to a handful of customers while NVIDIA sells a trillion dollars of their products to everyone.

Every framework, library, demo, tool, and app is going to use CUDA forever and ever while some “account manager” at AMD takes a government procurement officer to lunch to sell one more supercomputer that year.

HarHarVeryFunny · on June 26, 2024

I'd guess that the majority of ML software is written in PyTorch, not in CUDA, and PyTorch has support for multiple backends including AMD. torch.compile also supports AMD (generating Triton kernels, same as it does for NVIDIA), so for most people there's no need to go lower level.

pjmlp · on June 26, 2024

GPUs are used for more than only ML workloads.

CUDA relevance in the industry is so big now, that NVidia has several WG21 seats, and helps driving heterogenous programming roadmap for C++.

HarHarVeryFunny · on June 26, 2024

You can use PyTorch for more than ML. No need to use backprop. Thinks of it as GPU accelerated NumPy.

pjmlp · on June 26, 2024

I would like to see OctaneRender done in Pytorch. /s

HarHarVeryFunny · on June 26, 2024

Sure, but if the OctaneRender folk wanted to support AMD, then I highly doubt they'd be interested in a CUDA compatability layer either - they'd want to be using the lowest level API possible (Vulkan?) to get close to the metal and optimize performance.

pjmlp · on June 27, 2024

See, that is where you got all wrong, they dropped Vulkan for CUDA, and even made a talk about it at GTC.

https://www.cgchannel.com/2023/11/otoy-releases-first-public...

https://www.cgchannel.com/2023/11/otoy-unveils-the-octane-20...

And then again, there are plenty of other cases where Pytorch makes absolute no sense in GPU, which was the whole starting point.

HarHarVeryFunny · on June 28, 2024

> See, that is where you got all wrong

I said that if they wanted to support AMD they would use the closest-to-metal API possible, and your links prove that this is exactly their mindset - preferring a lower level more performant API to a higher level more portable one.

For many people the tradeoffs are different and ability to write code quickly and iterate on design makes more sense.

kllrnohj · on June 25, 2024

Nvidia's 2024 data center revenue was $46B. They got a long fucking way to go to get to trillion dollars of product.

jiggawatts · on June 26, 2024

Take a look at this chart going back ~3Y: https://ycharts.com/indicators/nvidia_corp_nvda_data_center_...

Their quarterly data centre revenue is now $22.6B! Even assuming that it immediately levels off, that's $90B over the next 12 months.

If it merely doubles, then they'll hit a total of $1T in revenue in about 6 years.

I'm an AI pessimist. The current crop of generative LLMs are cute, but not a direct replacement for humans in all but a few menial tasks.

However, there's a very wide range of algorithmic improvements available, which wouldn't have been explored three years ago. Nobody had the funding, motivation, or hardware. Suddenly, everyone believes that it is possible, and everyone is throwing money at the problem. Even if the fruits of all of this investment is just a ~10% improvement in business productivity, that's easily worth $1T to the world economy over the next decade or so.

AMD is absolutely leaving trillions of dollars on the table because they're too comfortable selling one supercomputer at a time to government customers.

Those customers will stop buying their kit very soon, because all of the useful software is being written for CUDA only.

kllrnohj · on June 26, 2024

Did you look at your own chart? There's no trend of 200% growth. Rather this last few quarters were a huge jump from relatively modest gains the years prior. Expecting 6 years of "merely doubling" is absolutely bonkers lol

Who can even afford to buy that much product? Are you expecting Apple, Microsoft, Alphabet, Amazon, etc to all dump 100% of their cash on Nvidia GPUs? Even then that doesn't get you to a trillion dollars

neverokay · on June 26, 2024

Once AI becomes a political spending topic like green energy, I think we’ll see nation level spending. Just need one medical breakthrough and you won’t be able to run a political campaign without AI in your platform.

jiggawatts · on June 26, 2024

Meta alone bought 350,000 H100 GPUs, which cost them $10.5 billion: https://www.pcmag.com/news/zuckerbergs-meta-is-spending-bill...

This kind of AI capital investment seems to have helped them improve the feed recommendations, doubling their market cap over the last few years. In other words, they got their money back many times over! Chances are that they're going to invest this capital into B100 GPUs next year.

Apple is about to revamp Siri with generative AI for hundreds of millions of their customers. I don't know how many GPUs that'll require, but I assume... many.

There's a gold rush, and NVIDIA is the only shovel manufacturer in the world right now.

kllrnohj · on June 26, 2024

> Meta alone bought 350,000 H100 GPUs, which cost them $10.5 billion

Right, which means you need about a trillion dollars more to get to a trillion dollars. There's not another 100 Metas floating around.

> Apple is about to revamp Siri with generative AI for hundreds of millions of their customers. I don't know how many GPUs that'll require, but I assume... many.

Apple also said they were doing it with their silicon. Apple in particular is all but guaranteed to refuse to buy from Nvidia even.

> There's a gold rush, and NVIDIA is the only shovel manufacturer in the world right now.

lol no they aren't. This is literally a post about AMD's AI product even. But Apple and Google both have in-house chips as well.

Nvidia is the big general party player, for sure, but they aren't the only. And more to the point, exponential growth of the already largest player for 6 years is still fucking absurd.

jiggawatts · on June 26, 2024

The GDP of the US alone over the next five years is $135T. Throw in other modern economies that use cloud services like Office 365 and you’re over $200T.

If AI can improve productivity by just 1% then that is $2T more. If it costs $1T in NVIDIA hardware then this is well worth it.

roenxi · on June 26, 2024

(note to conversation participants - I think jiggawatts might be arguing about $50B/qtr x 24 qtr = $1 trillion and kllrnohj is arguing $20 billion * 2^6 years = $1 trillion - although neither approach seems to be accounting for NPV).

That is assuming Nvidia can capture the value and doesn't get crushed by commodity economics. Which I can see happening and I can also see not happening. Their margins are going to be under tremendous pressure. Plus I doubt Meta are going to be cycling all their GPUs quarterly, there is likely to be a rush then settling of capital expenses.

sangnoir · on June 26, 2024

Another implicit assumption is that LLMs will be SoTA throughout that period, or the successor architecture will have an equally insatiable appetite for lots of compute, memory and memory bandwidth; I'd like to believe that Nvidia is one research paper away from a steep drop in revenue.

jiggawatts · on June 26, 2024

Agreed with @roenxi and I’d like to propose a variant of your comment:

All evidence is that “more is better”. Everyone involved professionally is of the mind that scaling up is the key.

However, like you said, just a single invention could cause the AI winds to blow the other way and instantly crash NVIDIA’s stock price.

Something I’ve been thinking about is that the current systems rely on global communications which requires expensive networking and high bandwidth memory. What if someone invents an algorithm that can be trained on a “Beowulf cluster” of nodes with low communication requirements?

For example the human brain uses local connectivity between neurons. There is no global update during “training”. If someone could emulate that in code, NVIDIA would be in trouble.

bayindirh · on June 30, 2024

> They will fail if they go after the highest margin customers.

They are already powering the most powerful supercomputers, so I guess you’re right.

Oh, by coincidence, the academic crowd is the primary user of these supercomputers.

Pure luck.

briffle · on June 26, 2024

AMD did go after intels server CPUs in the 2000s, with quite a bit of success.

touisteur · on June 26, 2024

And it worked mainly because they were a drop-in for Intel processors. Which was and is an amazing feat. I and most people could and can run anything compiled (but avx512 stuff back then on zen1 and 2 ?) without a hitch. And it was still a huge uphill battle and Intel let it happen, what with their bungling of the 10nm process.

I don't see how the same can work here. HIP isn't it right now (every time I try, anyway).

acchow · on June 25, 2024

> They would need a trillion dollars in capital to have a chance imho.

All AMD would really need is for Nvidia innovation to stall. Which, with many of their engineers coasting on $10M annual compensation, seems not too far fetched

sangnoir · on June 25, 2024

AMD can go toe to toe with Nvidia on hardware innovation. What AMD had realised (correctly, IMO), is that all they need is for hyperscalers to match/come close to Nvidia on software innovation on AMD hardware - Amazon/Meta/Microsoft engineers can get their foundation models running on M1300X well enough for their needs - CUDA is not much of moat in that market-segment where there are dedicated AI-infrastructure teams. If the price is right, they may shift some of those CapEx dollars from Nvidia to AMD. Few AI practitioners - and even fewer LLM consumers - care about the libraries underpinning torch/numpy/high-level-python-framework/$LLM-service, as long as it works.

antupis · on June 25, 2024

That is wrong move personally would start from localllm/llama folks who crave more memory and build up from there.

sangnoir · on June 25, 2024

Seeing that they don't have a mature software stack, I think for now AMD would prefer one customer who brings in $10m revenue over 10'000 customers at $1000 a pop.

lostmsu · on June 26, 2024

I doesn't make sense because they can market to both at the same time.

binkHN · on June 26, 2024

> It appears AMD initial strategy is courting the HPC crowd and hyperscaler...

I don't agree with this at all! Give me something that I can easily prototype at home and then quickly scale up at work!

jacoblambda · on June 25, 2024

> As long as AMDs investment into software and evangelism remains at ~$0

Last time I checked they have been trying to hire a ton of software engineers for improving the applied stacks (CV, ML, DSP, compute, etc) at the location near where I'm located.

It seems like there's a big push to improve the stacks but given that less than 10 years ago they were practically at death's door it's not terribly surprising that their software is in the state it is. It's been getting better gradually but quality software doesn't just show up over night and especially so when things are as complex and arcane as they are in the GPU world.

benreesman · on June 25, 2024

With margins that high?

There is always financing, there are always people willing to go to the competitor at some wage, there is always a way if the leadership wants to.

If it was just a straight up fab bottleneck? Yeah maybe you buy that for a year or two.

“During Q1, Nvidia reported $5.6 billion in cost of goods sold (COGS). This resulted in a gross profit of $20.4 billion, or a margin profile of 78.4%.”

That’s called an “induced market failure”.

jacoblambda · on June 26, 2024

They literally bought Xilinx for their software engineering team. That's at least a thousand firmware engineers and software engineers focused on software stack improvements. That was two years ago. And on top of Xilinx they've been hiring staff like crazy for years now.

The issue was that they basically let everyone go who wasn't building hardware for their essential product lines (CPU & GPU) other than a skeleton crew to keep the software at least mostly functioning. And as much as this seems like it was a bad decision, AMD was probably weeks from bankruptcy by the time they got Zen out the door even despite doing this. Had they not done so, they'd almost certainly closed up entirely.

So for the last ~5 years minimum now they've been building back their software teams and trying to recuperate what they lost in institutional knowledge. That all takes time to do even if you hire back twice as many engineers as you lost.

And so now we are here. Things are clearly improving but nowhere near acceptable yet. But there's a trend of improvement.

Dylan16807 · on June 26, 2024

> Things are clearly improving

How long am I supposed to wait, as my still-modern AMD GPU sits still-unsupported?

The anecdote above doesn't even sound like there's any improvement at all, let alone "clear" improvement.

And with Zen in 2017 and Zen+ in 2018 the counter is past six years at this point since the money gates opened wide.

jacoblambda · on June 26, 2024

> How long am I supposed to wait, as my still-modern AMD GPU sits still-unsupported?

Which GPU do you have? At least according to these docs, on linux the upper chunk of RDNA3 is supported officially but from experience, basically all 6xxx or 7xxxx cards are unofficially supported if you build it for your target arch. 5xxx cards get the short end of the stick and got skipped (they were a rough launch) but Radeon VII cards should also still be officially supported (with support shifting to unofficial status in the next release).

https://rocm.docs.amd.com/en/latest/compatibility/compatibil...

And given that ROCm is pretty core to AMD's support for the windows AI stack (via ONNX), you can assume any new GPUs released from here on out will be supported.

Dylan16807 · on June 26, 2024

It's 5xxx. And "rough launch" is not an excuse. They've had plenty of time. Is it that different from the other RDNA cards?

The unofficial support for so many cards is not a good situation either.

Edit: Actually, no, I know it's not that different, because some versions of ROCm largely work on RDNA1 if you trick them. They are just refusing to do the extra bit of work to patch over the differences.

jacoblambda · on June 26, 2024

I mean it apparently works on RDNA1 now after some effort but they never really attempted to support it because they initially only supported workstation RDNA cards but they didn't have a workstation RDNA1 release.

https://www.reddit.com/r/ROCm/comments/1bd8vde/psa_rdna1_gfx...

I wish they had comprehensive support for basically all recent GPU releases but tbh I'd rather they focus on perfecting support for the current and upcoming generations than spread their efforts too thin.

And ideally with time backports to the older cards will come with time but it's really not a priority over issues on the current generation because those RDNA1 cards were never actually supported in the first place.

Dylan16807 · on June 27, 2024

Every post I see about trying it has the person run into issues, but maybe Soon it will finally be true.

Certhas · on June 26, 2024

Have you ever organized anything of size?

Financing is not the bottleneck. Organizational capacity might well be, though. As an organization, AMDs survival depended not on competing with nVidia but on competing with Intel. Now they are established, in what must be one of the greatest come from behind successes in tech history. 8 years ago, Intel was worth 80 times as much as AMD, today AMD has surpassed them:

https://www.financecharts.com/compare/AMD,INTC/summary/marke...

Stock isn't reality, but I wouldn't so easily assume that the team that led AMD to overtake Intel are idiots.

almostgotcaught · on June 25, 2024

> With margins that high? There is always financing, there are always people willing to go to the competitor at some wage, there is always a way if the leadership wants to.

People love to pop-off on stuff they really know anything about. Let me ask you: what financing do you imagine is available? Like literally what financing do you propose for a publically traded company? Like do you realize they can't actually issue new shares without putting it to a shareholder vote? Should they issue bonds? No I know they should run an ICO!!!

And then what margins exactly? Do you know what the margin is on MI300? No. Do you know whether they're currently selling at a loss to win marketshare? No.

I would the happiest boy if hn, in addition to policing jokes and memes, could police arrogance.

JohnPrine · on June 25, 2024

Are you saying that companies lose the ability to secure financing once they go public?

almostgotcaught · on June 25, 2024

of course not - mentioned 3 routes to securing further financing. did you read about those 3 routes in my comment?

Dylan16807 · on June 26, 2024

You mentioned them all mockingly. If you weren't trying to suggest none are viable, you need to reword.

almostgotcaught · on June 26, 2024

This isn't hard: financing routes exist but they aren't as simple or easy or straightforward as the person to whom I was responding makes it seem.

Dylan16807 · on June 26, 2024

They didn't imply it was notably easy. Your reply there only makes sense if you were trying to say it's nearly impossible. If you're just saying it's kinda hard then your post is weirdly hostile for no reason, reading theirs in an extreme way just so you can correct it harder.

almostgotcaught · on June 26, 2024

> They didn't imply it was notably easy

Really? I must be reading a different language than English here

> There is always financing, there are always people willing to go to the competitor at some wage, there is always a way if the leadership wants to.

Dylan16807 · on June 26, 2024

If "always a way" implies anything about difficulty, it implies that there are challenges to overcome, not ease.

almostgotcaught · on June 26, 2024

I guess there's always a way to play devil's advocate <shrug>

jwuphysics · on June 25, 2024

Have you looked into TinyCorp [0]/tinygrad [1], one of the latest endeavors by George Hotz? I've been pretty impressed by the performance. [2]

[0] https://tinygrad.org/ [1] https://github.com/tinygrad/tinygrad [2] https://x.com/realGeorgeHotz/status/1800932122569343043?t=Y6...

anthonix1 · on June 25, 2024

I have not been impressed by the perf. Slower than PyTorch for LLMs, and PyTorch is actually stable on AMD (I've trained 7B/13B models).. so the stability issues seem to be more of a tinygrad problem and less of an AMD problem, despite George's ramblings [0][1]

[0] https://github.com/tinygrad/tinygrad/issues/4301 [1] https://x.com/realAnthonix/status/1800993761696284676

arghwhat · on June 25, 2024

He also shakes his fist at the software stack, but loudly enough that it has AMD react to it.

monkeydust · on June 25, 2024

As more a business person than engineer, help me understand why AMD are not getting this, what's the counter argument? Is CUDA just too far ahead, are they lacking the right people in senior leadership roles to see this through?

dagw · on June 25, 2024

CUDA is very far ahead. Not only technically, but in mindshare. Developers trust CUDA and know that investing in CUDA is a future proof investment. AMD has had so many API changes over the years, that no one trusts them any more. If you go all in on AMD, you might have to re-write all your code in 3-5 years. AMD can promise that this won't happen, but it's happened so many times already that no one really believes them.

Another problem is simply that hiring (and keeping) top talent is really really hard. If you're smart enough to be a lead developer of AMDs core Machine Learning libraries, you can probably get hired at any number of other places, so why choose AMD.

I think the leadership gets it and understand the importance, I just don't think they (or really anybody) knows how to come up with a good plan to turn things around quickly. They're going to have to commit to at least a 5 year plan and lose money each of those 5 years, and I'm not sure they can or even want to fight that battle.

martinpw · on June 25, 2024

> Another problem is simply that hiring (and keeping) top talent is really really hard.

Absolutely. And when your mandate for this top talent is going to be "go and build something that basically copies what those other guys have already built", it is even harder to attract them, when they can go any place they like and work on something new.

> I think the leadership gets it and understand the importance, I just don't think they (or really anybody) knows how to come up with a good plan to turn things around quickly.

Yes, it always puzzles me when people think nobody at AMD actually sees the problem. Of course they see it. Turning a large company is incredibly hard. Leadership can give direction, but there is so much baked in momentum, power structures, existing projects and interests, that it is really tough to change things.

DaoVeles · on June 25, 2024

CUDA is one area that Nvidia really nailed. When it was first announcement I saw it as something neat but could have never envisioned just how ingrained it would become. This was long before AI training/execution was something really on most people radars.

But for years I have heard the same things from so many people working in the field. "We hate Nvidia because they got it so right but are the only option."

hedgehog · on June 25, 2024

As another commenter points out their strategy appears to be to focus on HPC clients where AMD can focus providing after-sale software support around a relatively small number of customer requests. This gets them some sales while avoiding the level of organizational investment necessary to build a software platform that can support NVIDIA-style broad compatibility and good out-of-the-box experience.

pjmlp · on June 26, 2024

Yes, to add to the other comments, what many don't realize is that CUDA is an ecosystem, C, C++ and Fortran foremost, however NVidia quickly realized that supporting any programming language community to target PTX was a very good idea.

Their GPUs were re-designed to follow C++ memory model, and many NVidia engineers are seat at ISO C++, yet making CUDA the best way to run heterogenous C++. Something that Intel also realized, by acquiring CodePlay, key players in SYCL, and also employing ISO C++ contributors.

Then there are the Visual Studio and Eclipse plugins, and graphical debuggers that allow even to single step shaders if you so wish.

alecco · on June 25, 2024

> are they lacking the right people in senior leadership roles to see this through?

Just like Intel, they have an outdated culture. IMHO they should start a software Skunk Works isolated from the company and have the software guys guide the hardware features. Not the other way around.

I wouldn't bet money on either of them doing this. Hopefully some other smaller, modern, and flexible companies can try it.

cyanydeez · on June 25, 2024

CUDA is a software moat. If you want to use any gpu other than nvidia, you need to double your engineering budget because theres no easy to bootstrap projects at any level. The hardware prices are meaninglesz if you need a 200k engineer, if they exist, just.to bootstrap a product.

rbanffy · on June 25, 2024

Depending on your hardware budget, the engineering one can look like a rounding error.

cyanydeez · on June 25, 2024

Sure, but then youre still on the.side.of NVIDIA because you jave the.budget.

sangnoir · on June 25, 2024

Why give any additional money to Nvidia when you can announce more profits (or get more compute if you're a government agency) by hiring more engineers to enable AMD hardware for less than a few million per year? It's not like Microsoft loves the idea of handing over money to Nvidia if there is a cheaper alternative that can make $MSFT go up.

sliken · on June 26, 2024

Say your success rate for replicating CUDA+Nvidia hardware on AMD is 60%. But it will take 2 years. That's not going to be compelling for any large org, especially when the MI300x is cheaper, but not crazy cheaper than an h100.

Especially since CUDA is still rolling out new functionality and optimizations, so the goal posts will keep moving.

sangnoir · on June 26, 2024

> Say your success rate for replicating CUDA+Nvidia[...]

Rational hyperscalers would just stop as soon as their tooling/workloads/models are functional on AMD hardware within an acceptable perf envelope - just like they already do with their custom silicon. Replicating CUDA is just unnecessary, expensive and time-consuming completionism; if some workloads require CUDA, they will be executed on Nvidia clusters that are part of the fleet.

rbanffy · on June 26, 2024

It depends on how cheaper the total solution is and how available the hardware is. If I can't get Nvidia hardware less than six months after I get AMD hardware, I have a couple months to port my software to AMD and still beat my competitor that's waiting for Nvidia. It's always a matter of how many problems can you solve for a given amount of money x time.

cyanydeez · on June 26, 2024

Sure, but the "it depends" is carrying a lot of weight. NVIDIA's moat will get you testable software straight out the gate; any other stack currently is a game of "how long can we take to get this going".

Corporations simply arn't interested in long term gains unless there's a straightforward path.

rbanffy · on June 26, 2024

It depends on the problems you have. If you need CUDA, then you married yourself to Nvidia. If you can use libraries that work equally well on both, then you would benefit.

When you are a government agency, it’s more palatable to spend the budget in a way it results in employment of nationals and development of indigenous technologies.

cyanydeez · on June 26, 2024

Because if you don't join NVIDA your likely hood of success goes down. So the "more profits" you speak of is gambling money. Most corporations arn't going to gamble.

rbanffy · on June 26, 2024

Depends on you needing CUDA or not. If you don’t, you can use anything.

It was this same game with x86 and ARM is eroding the former king’s place in the datacenter.

noelwelsh · on June 25, 2024

Leadership lacking vision + being almost bankrupt until relatively recently.

slavik81 · on June 25, 2024

MIVisionX is probably the library you want for computer vision. As for kernels, you would generally write HIP, which is very similar to CUDA. To my knowledge, there's no equivalent to cupy for writing kernels in Python.

For what it's worth, your post has cemented my decision to submit a few conference talks. I've felt too busy writing code to go out and speak, but I really should make time.

hyperbovine · on June 25, 2024

The equivalent to cupy is ... cupy:

https://docs.cupy.dev/en/v13.2.0/install.html#using-cupy-on-...

slavik81 · on June 25, 2024

Oh cool! It appears that I've already packaged cupy's required dependencies for AMD GPU support in the Debian 13 'main' and Ubuntu 24.04 'universe' repos. I also extended the enabled architectures to cover all discrete AMD GPUs from Vega onwards (aside from MI300, ironically). It might be nice to get python3-cupy-rocm added to Debian 13 if this is a library that people find useful.

pjmlp · on June 26, 2024

HIP isn't similar to CUDA, in the set of available languages that target PTX, existing library ecosystem, IDE plugins and graphical debuggers.

This is the kind of stuff AMD keeps missing out, even OneAPI from Intel looks better in that regard.

xadhominemx · on June 25, 2024

If you are looking for attention from an evangelist, I'm sorry but you are not the target customer for MI300. They are courting the Hyperscalers for heavy duty production inference workloads.

lostmsu · on June 26, 2024

I also stopped by their booth and talked about trial access, and right away asked for easy access a la Google Collab, specifically without bureaucracy. And they are like "yeah, we are making it, but nah man, you can't just login and use it, you gotta fill a form and wait for us to approve it". Was very disappointed at that point.

That was a marketing guy BTW. I don't think they realize their marketing strategies suck.

cstejerean · on June 25, 2024

Completely agree. It's been 18 years since Nvidia released CUDA. AMD has had a long time to figure this out so I'm amazed at how they continue to fumble this.

dragontamer · on June 25, 2024

10 years ago AMD was selling its own headquarters so that it could stave off bankruptcy for another few weeks (https://arstechnica.com/information-technology/2013/03/amd-s...).

AMD's software investments have begun in earnest a few years ago, but AMD really did progress more than pretty much everyone else aside from NVidia IMO.

AMD further made a few bad decisions where they "split the bet", relying upon Microsoft and others to push software forward. (I did like C++ Amp for what its worth). The underpinnings of C++Amp led to Boltzmann which led to ROCm, which then needed to be ported away from C++Amp and into CUDA-like Hip.

So its a bit of a misstep there for sure. But its not like AMD has been dilly dallying. And for what its worth, I would have personally preferred C++ Amp (a C++11 standardized way to represent GPU functions as []-lambdas rather than CUDA-specific <<<extensions>>>). Obviously everyone else disagrees with me but there's some elegance to parallel_for_each([](param1, param2){magically a GPU function executing in parallel}), where the compiler figures out the details of how to get param1 and param2 from CPU RAM into GPU (or you use GPU-specific allocators to make param1/param2 in the GPU codespace already to bypass the automagic).

pjmlp · on June 26, 2024

Nowadays you can write regular C++ in CUDA if you so wish, and contrary to AMD, NVidia employs several WG21 contributors.

kimixa · on June 25, 2024

CUDA of 18 years ago is very different to CUDA of today.

Back then AMD/ATI were actually at the forefront on the GPGPU side - things like the early brook language and CTM lead pretty quickly into things like OpenCL. Lots of work went on using the xbox360 gpu in real games for GPGPU tasks.

But CUDA steadily improved iteratively, and AMD kinda just... stopped developing their equivalents? Considering a good part of that time they were near bankruptcy it might have not have been surprising though.

But saying Nvidia solely kicked off everything with CUDA is rather a-historical.

dagw · on June 25, 2024

AMD kinda just... stopped developing their equivalents?

I wasn't so much that they stopped developing, rather they kept throwing everything out and coming out with new and non backwards compatible replacements. I knew people working in the GPU Compute field back in those days who were trying to support both AMD/ATI and NVidia. While their CUDA code just worked from release to release and every new release of CUDA just got better and better, AMD kept coming up with new breaking APIs and forcing rewrite and rewrite until they just gave up and dropped AMD.

userabchn · on June 25, 2024

> CUDA of 18 years ago is very different to CUDA of today.

I've been writing CUDA since 2008 and it doesn't seem that different to me. They even still use some of the same graphics in the user guide.

yvdriess · on June 25, 2024

Yep! I used BrookGPU for my GPGPU master thesis, before CUDA was a thing. AMD lacked followthrough on yhe software side as you said, but a big factor was also NV handing out GPUs to researchers.

bryanlarsen · on June 25, 2024

10 years ago they were basically broke and bet the farm on Zen. That bet paid off. I doubt a bet on CUDA would have paid off in time to save the company. They definitely didn't have the resources to split that bet.

jsheard · on June 25, 2024

It's not like the specific push for AI on GPUs came out of nowhere either, Nvidia first shipped cuDNN in 2014.

ColonelPhantom · on June 26, 2024

Did you talk to anyone from Intel? It seems they were also present: https://community.intel.com/t5/Blogs/Tech-Innovation/Artific...

qaq · on June 25, 2024

Well if Mojo and Modular Max Platform take off I guess there will be a path for AMD

pjmlp · on June 26, 2024

Well,

"Modular to bring NVIDIA Accelerated Computing to the MAX Platform"

https://www.modular.com/blog/modular-partners-with-nvidia-to...

qaq · on June 27, 2024

The whole point of Max is that you can compile same code to multiple targets without manually optimizing for a given target. They are obviously going to support NVIDIA as a target.

pjmlp · on June 27, 2024

Yet you haven't seen any AMD or Intel deal from them.

qaq · on June 30, 2024

Cause they start with the target with largest user base?

make3 · on June 25, 2024

99%+ of people aren't writing kernels man, this doesn't mean anything, this is just silly

latchkey · on June 25, 2024

The news you've all been waiting for!

We are thrilled to announce that Hot Aisle Inc. proudly volunteered our system for Chips and Cheese to use in their benchmarking and performance showcase. This collaboration has demonstrated the exceptional capabilities of our hardware and further highlighted our commitment to cutting-edge technology.

Stay tuned for more exciting updates!

JonChesterfield · on June 25, 2024

Thank you for loaning the box out! Has a lot more credibility than the vendor saying it runs fast

latchkey · on June 25, 2024

Thanks Jon, that's exactly the idea. About $12k worth of free compute on a box that costs as much as a Ferrari.

Funny that HN doesn't like my comment for some reason though.

alecco · on June 25, 2024

Don't sweat it. Some people are trigger happy on downvoting things looking like self-promotion due to the sheer amount of spam everywhere. Your sponsorship (?) is the right way to promote your company. Thank you.

renewiltord · on June 25, 2024

It reads like the kind of chumbox PR you read at bottom of random website. Get a copywriter or something like writer.ai. I thought your comment was spam and nearly flagged it. It really is atrocious copy.

jampekka · on June 25, 2024

I thought it was sarcastic.

logicallee · on June 25, 2024

[retracted]

klelatti · on June 25, 2024

Do you think this comment will make Hot Aisle more or less likely to loan out their hardware in the future?

Personally, I couldn't care less about the quality of copy. I do care about having access to similar hardware in the future.

latchkey · on June 25, 2024

Heh, I didn't even think of that, but you make a good point. Don't worry though, we will keep the access coming. I hate to say it, but it literally is... stay tuned for more exciting updates.

klelatti · on June 25, 2024

Thanks so much for doing that. There are loads of people here who really appreciate it. We will stay tuned!

latchkey · on June 25, 2024

This is the news that many people have been waiting for and we do have more exciting updates coming. There is another team on the system now doing testing. We have a list of 22 people currently waiting.

logicallee · on June 25, 2024

okay, I've retracted my comments. Thanks for your generosity.

latchkey · on June 25, 2024

Thanks, but I wouldn't call it generosity. We're helping AMD build a developer flywheel and that is very much to our benefit. The more developers using these chips, the more chips that are needed, the more we buy to rent out, the more our business grows.

Previously, this stuff was only available to HPC applications. We're trying to get these into the hands of more developers. Our view is that this is a great way to foster the ecosystem.

Our simple and competitive pricing reflects this as well.

jsheard · on June 25, 2024

All eyes are of course on AI, but with 192GB of VRAM I wonder if this or something like it could be good enough for high end production rendering. Pixar and co still use CPU clusters for all of their final frame rendering, even though the task is ostensibly a better fit for GPUs, mainly because their memory demands have usually been so far ahead of what even the biggest GPUs could offer.

Much like with AI, Nvidia has the software side of GPU production rendering locked down tight though so that's just as much of an uphill battle for AMD.

PaulHoule · on June 25, 2024

One missed opportunity from the game streaming bubble would be a 20-or-so player game where one big machine draws everything for everybody and streams it.

Nexxxeh · on June 25, 2024

It would immediately prevent several classes of cheating. No more wallhacks or ESP.

Ironically the main type that'd still exist would be the vision-based external AI-powered target-highlighting and aim/fire assist.

The display is analysed and overlaid with helpful info (like enemies highlighted) and/or inputs are assisted (snap to visible enemies, and/or automatically pull trigger.)

bob1029 · on June 25, 2024

Stuff like this is still of interest to me. There are some really compelling game ideas that only become possible once you look into modern HPC platforms and streaming.

PaulHoule · on June 25, 2024

My son and I have wargamed it a bit. The trouble is there is a huge box of tricks used in open world and other complex single player games for conserving RAM that compete with just having a huge amount of RAM and it is not so clear the huge SMP machine with a huge GPU really comes out ahead in terms of creating a revolution in gaming.

In the case of Stadia, however, failing to develop this was like a sports team not playing any home games. One way of thinking about the current crisis of the games industry and VR is that building 3-d worlds is too expensive and a major part of it is all the shoehorning tricks the industry depends on. Better hardware for games could be about lowering development cost as opposed to making fancier graphics but that tends to be a non-starter with companies whose core competence is getting 1000 highly-paid developers to struggle with difficult to use tools and the idea you could do the same with 10 ordinary developers is threatening to them.

bob1029 · on June 25, 2024

I am thinking beyond the scale of any given machine and traditional game engine architectures.

I am thinking of an entire datacenter purpose-built to host a single game world, with edge locations handling the last mile of client-side prediction, viewport rendering, streaming and batching of input events.

We already have a lot of the conceptual architecture figured out in places like the NYSE and CBOE - Processing hundreds of millions of events in less than a second on a single CPU core against one synchronous view of some world. We can do this with insane reliability and precision day after day. Many of the technology requirements that emerge from the single instance WoW path approximate what we have already accomplished in other domains.

rcxdude · on June 25, 2024

EVE online is more or less the closest to this so far, so it may be worth learning lessons from them (though I wouldn't suggest copying their approach: their stackless python behemoth codebase appears to contain many a horror). It's certainly a hard problem though, especially when you have a concentration of large numbers of players (which is inevitable when you create such a game world).

fragmede · on June 26, 2024

The question though is how you make something that complex and not have it be a horror though, and is stackless python really the culprit of the horror vs anything else they could have built it in.

ganzuul · on June 25, 2024

Curious what that is. Some kind of AR physics simulation?

I have been thinking about if the the compute could go right in cellphone towers but this would take it up a notch.

ThrowawayTestr · on June 25, 2024

Stadia was supposed to allow for really big games distributed across a cluster. Too bad it died in the crib.

Havoc · on June 25, 2024

I’d imagine ray tracing is a bit easier to paralize over lots of older cards. The computations aren’t as heavily linked and are more fault tolerant. So I doubt anyone is paying h100 style premiums

jsheard · on June 25, 2024

The computations are easily parallelized, sure, but the data feeding those computations isn't easily partitioned. Every parallel render node needs as much memory as a lone render node would, and GPUs typically have nowhere near enough for the highest of high end productions. Last I heard they were putting around 128GB to 256GB of RAM in their machines and that was a few years ago.

bryanlarsen · on June 25, 2024

Pixar is paying a massive premium; they probably are using an order of magnitude or two more CPUs than they would if they could use GPUs. Using a hundred CPUs in place of a single H100 is a greater-than-h100 style premium.

imjonse · on June 25, 2024

Would Pixar's existing software run on GPUs without much work?

jsheard · on June 25, 2024

It does already, at least on Nvidia GPUs: https://rmanwiki.pixar.com/pages/viewpage.action?mobileBypas...

They currently only use the GPU mode for quick iteration on relatively small slices of data though, and then switch back to CPU mode for the big renders.

JackYoustra · on June 25, 2024

It's probably implemented way differently, but I worry about the driver suitability. Gaming benchmarks at least perform substantially worse on AI accelerators than even many generations old GPUs, I wonder if this extends to custom graphics code too.

Arelius · on June 25, 2024

I work in this field, and I think so. This is actually the project I'm currently working on.

I'm betting with current hardware and some clever tricks, we can resolve full production frames in real-time rates.

snaeker58 · on June 25, 2024

I hate the state of AMDs software for non gamers. RoCm is a war crime (which has improved dramatically in the last two years and still sucks).

But like many have said considering AMD was almost bankrupt their performance is impressive. This really speaks for their hardware division. If only they could get the software side of things fixed!

Also I wonder if NVIDIA has an employee of the decade plaque for CUDA. Because CUDA is the best thing that could’ve happened to them.

Pesthuf · on June 25, 2024

I feel like these huge graphics cards with insane amounts of RAM are the moat that AI companies have been hoping for.

We can't possibly hope to run the kinds of models that run on 192GB of VRAM at home.

tonetegeatinst · on June 25, 2024

Ow contrarily I'd argue the opposite. GPU vram has gotten faster but the density isn't that good. 8gb used to be high end for the early 2000's yet now 16gb can't even run games that well, especially if its a studio that loves vram.

Side note: as someone who has been into machine learning for over 10 years, let me tell ya us hobbyists and researchers hunger for compute and memory.

VRAM isn't everything.....I am well aware but certain workflows really do benefit from heaps of vram like vfx and cad and CFD. I realize that the dream of upgradable GPUs where I can upgrade the different components just like you do on the computer. Computer is slow, then upgrade ram or storage or get a faster chip that uses the same socket. GPU could possibility see modularity with the processor the vram etc.

Level1Tech has some great videos about how PCIe is the future...where we can connect systems together using raw PCI lanes, which is similar to how nvidia Blackwell servers communicate to other servers in the rack.

immibis · on June 25, 2024

Wasn't that just because of Nvidia's market segmentation?

jsheard · on June 25, 2024

Apple will gladly sell you a GPU with 192GB of memory, but your wallet won't like it.

kbenson · on June 25, 2024

Won't Nvidia, and Intel, and Qualcomm, and Falanx (who make the ARM Mali GPUs from what I can see), and Imagination Technologies (PowerVR) do the same? They each make a GPU, and if you pay them enough money I have a hard time beleiving they won't figure out how to slap enough RAM on a board for one of their existing products and making whatever changes are required.

nextaccountic · on June 25, 2024

The US government is looking into heavily limit availability of high end GPUs from now on. And the biggest and most effective bottleneck for AI right now is VRAM

So maybe Apple is happy to sell huge GPUs like that but the government will probably put it under export controls like A100 and H100 already is

rbanffy · on June 25, 2024

Cue the PowerMac G4 TV ad.

https://youtu.be/lb7EhYy-2RE

rbanffy · on June 25, 2024

OTOH, it comes free with one of the finest Unix workstations ever made.

bitsandboots · on June 26, 2024

It's easy to be best when you have no competition. Linux exists for the rest of us.

rbanffy · on June 26, 2024

It’s good even if compared to Linux. Not perfect, but certainly not bad.

Rinzler89 · on June 25, 2024

Which Unix workstation?

coolspot · on June 25, 2024

They are referring to MacOS being included with expensive Mac hardware.

rbanffy · on June 26, 2024

How many desktop systems can have 192GB visible to the GPU? How many cost less than a Mac?

Rinzler89 · on June 26, 2024

Just because it has a lot of GPU RAM doesn't mean it's actually useful for people doing ML work.

How many companies use Macs for ML work instead of Nvidia and Cuda?

rbanffy · on June 26, 2024

It won’t be as fast as a high-end GPU like the MI300 series, but it’s enough to check whether the code works before running it on a high-end GPU-heavy machine and the large GPU-accessible RAM simplifies the code enormously, as you don’t have to partition and shuffle data between CPU and GPU.

Rinzler89 · on June 26, 2024

Ok that's the theory but how many companies actually do that for their workflow. All the ML companies I saw use directly Cuda for prototyping to production and don't bother with Apple ML unless their target happens to be exclusively iPhones.

rbanffy · on June 26, 2024

Anyone doing heavy lifting and low-level tooling will be better to optimise for specialised training and inference engines. Usage will depend on where the abstraction layer is - if you want to see CUDA, then you'll need Nvidia. If all you care for is the size of the model and you know it's very large, then the Apple hardware becomes competitive.

Besides, you'd be well served with a Mac as a development desktop anyway.

Rinzler89 · on June 26, 2024

Everyone has laptops now though. Nobody's gonna carry a Mac studio between home and office. And if you're gonna use your Mac just as an SSH machine then you'll remote to a Nvidia data center anyway not to a Mac studio.

rbanffy · on June 26, 2024

I still have a Mac Mini on my desk in my home office, regardless of the laptops. If I were into crunching 192 gigabytes of numbers at a time, I’d get myself a Mac Studio.

At least until someone makes an MI300A workstation.

Rinzler89 · on June 27, 2024

Sure, but then if you take your code to production to monetize it as a business you won't be deploying on a datacenter of Mac Minis.

What you alone do at home, is irelevant for the ML market as a whole, along with your Mac Mini, as you alone won't move the market, and the companies serious about ML are all-in on Nvidia and CUDA compatible code for mass deployment.

I can also get to run some NNs on some microcontroler, but my hoppy project won't move the market, and that's what I was talking about, the greater market, not your hobby project.

phkahler · on June 25, 2024

>> We can't possibly hope to run the kinds of models that run on 192GB of VRAM at home.

I'm looking to build a mini-ITX system with 256GB of RAM for my next build. DDR5 spec can support that in 2 modules, but nobody makes them yet. No need for a GPU, I'm looking to the AMD APUs which are getting into the 50TOPs range. But yes, RAM seems to be the limiting factor. I'm a little surprised the memory companies aren't pushing harder for consumers to have that capacity.

auspiv · on June 25, 2024

128GB DDR5 module - https://store.supermicro.com/us_en/supermicro-hynix-128gb-28...

It is of course RDIMM, but you didn't specify what memory type you were looking at.

dmbaggett · on June 25, 2024

For inference you could use a maxed-out Mac Ultra; the RAM is shared between the CPU and GPU.

alecco · on June 25, 2024

For single user (batch_size = 1), sure. But that is quite expensive in $/tok.

elorant · on June 25, 2024

Even if the community provides support it could take years to reach the maturity of CUDA. So while it's good to have some competition, I doubt it will make any difference in the immediate future. Unless some of the big corporations in the market lean in heavily and support the framework.

jeroenhd · on June 25, 2024

If, and that's a big if, AMD can get ROCm working well for this chip, I don't think this will be a big problem.

ROCm can be spotty, especially on consumer cards, but for many models it does seem to work on their more expensive models. It may be worth it spending a few hours/days/weeks to work around the peculiarities of ROCm given the cost difference between AMD and Nvidia in this market segment.

This all stands or falls with how well AMD can get ROCm to work. As this article states, it's nowhere near ready yet, but one or two updates can turn AMD's accelerators from "maybe in 5-10 years" to "we must consider this next time we order hardware".

I also wonder if AMD is going to put any effort into ROCm (or a similar framework) as a response to Qualcomm and other ARM manufacturers creaming them on AI stuff. If these Copilot PCs take off, we may see AMD invest into their AI compatibility libraries because of interest from both sides.

latchkey · on June 25, 2024

https://stratechery.com/2024/an-interview-with-amd-ceo-lisa-...

"One of the things that you mentioned earlier on software, very, very clear on how do we make that transition super easy for developers, and one of the great things about our acquisition of Xilinx is we acquired a phenomenal team of 5,000 people that included a tremendous software talent that is right now working on making AMD AI as easy to use as possible."

jjoonathan · on June 25, 2024

Oh no. Ohhhh nooooo. No, no, no!

Xilinx dev tools are awful. They are the ones who had Windows XP as the only supported dev environment for a product with guaranteed shipments through 2030. I saw Xilinx defend this state of affairs for over a decade. My entire FPGA-programming career was born, lived, and died, long after XP became irrelevant but before Xilinx moved past it, although I think they finally gave in some time around 2022. Still, Windows XP through 2030, and if you think that's bad wait until you hear about the actual software. These are not role models of dev experience.

In my, err, uncle? post I said that I was confused about where AMD was in the AI arms race. Now I know. They really are just this dysfunctional. Yikes.

pbalcer · on June 25, 2024

Xilinx made triSYCL (https://github.com/triSYCL/triSYCL), so maybe there's some chance AMD invests first-class support for SYCL (an open standard from Khronos). That'd be nice. But I don't have much hope.

pjmlp · on June 26, 2024

Comparing what AMD has done so far with SYCL, and what Intel has done with OpenAPI, yeah better not keep that hope flame burning.

paulmd · on June 25, 2024

this is honestly a very enlightening interview because - as pointed out at the time - Lisa Su is basically repeatedly asked about software and every single time she blatantly dodges the question and tries to steer the conversation back to her comfort-zone on hardware. https://news.ycombinator.com/item?id=40703420

> He tries to get a comment on the (in hindsight) not great design tradeoffs made by the Cell processor, which was hard to program for and so held back the PS3 at critical points in its lifecycle. It was a long time ago so there's been plenty of time to reflect on it, yet her only thought is "Perhaps one could say, if you look in hindsight, programmability is so important". That's it! In hindsight, programmability of your CPU is important! Then she immediately returns to hardware again, and saying how proud she was of the leaps in hardware made over the PS generations.

> He asks her if she'd stayed at IBM and taken over there, would she have avoided Gerstner's mistake of ignoring the cloud? Her answer is "I don’t know that I would’ve been on that path. I was a semiconductor person, I am a semiconductor person." - again, she seems to just reject on principle the idea that she would think about software, networking or systems architecture because she defines herself as an electronics person.

> Later Thompson tries harder to ram the point home, asking her "Where is the software piece of this? You can’t just be a hardware cowboy ... What is the reticence to software at AMD and how have you worked to change that?" and she just point-blank denies AMD has ever had a problem with software. Later she claims everything works out of the box with AMD and seems to imply that ROCm hardly matters because everyone is just programming against PyTorch anyway!

> The final blow comes when he asks her about ChatGPT. A pivotal moment that catapults her competitor to absolute dominance, apparently catching AMD unaware. Thompson asks her what her response was. Was she surprised? Maybe she realized this was an all hands to deck moment? What did NVIDIA do right that you missed? Answer: no, we always knew and have always been good at AI. NVIDIA did nothing different to us.

> The whole interview is just astonishing. Put under pressure to reflect on her market position, again and again Su retreats to outright denial and management waffle about "product arcs". It seems to be her go-to safe space. It's certainly possible she just decided to play it all as low key as possible and not say anything interesting to protect the share price, but if I was an analyst looking for signs of a quick turnaround in strategy there's no sign of that here.

not expecting a heartfelt postmortem about how things got to be this bad, but you can very easily make this question go away too, simply by acknowledging that it's a focus and you're working on driving change and blah blah. you really don't have to worry about crushing some analyst's mindshare on AMD's software stack because nobody is crazy enough to think that AMD's software isn't horrendously behind at the present moment.

and frankly that's literally how she's governed as far as software too. ROCm is barely a concern. Support base/install base, obviously not a concern. DLSS competitiveness, obviously not a concern. Conventional gaming devrel: obviously not a concern. She wants to ship the hardware and be done with it, but that's not how products are built and released in 2020 anymore.

NVIDIA is out here building integrated systems that you build your code on and away you go. They run NVIDIA-written CUDA libraries, NVIDIA drivers, on NVIDIA-built networks and stacks. AMD can't run the sample packages in ROCm stably (as geohot discovered) on a supported configuration of hardware/software, even after hours of debugging just to get it that far. AMD doesn't even think drivers/runtime is a thing they should have to write, let alone a software library for the ecosystem.

"just a small family company (bigger than NVIDIA, until very recently) who can't possibly afford to hire developers for all the verticals they want to be in". But like, they spent $50b on a single acquisition, they spent $12b in stock buybacks over 2 years, they have money, just not for this.

jjoonathan · on June 25, 2024

So I knew that AMD's compute stack was a buggy mess -- nobody starts out wanting to pay more for less and I had to learn the hard way how big of a gap there was between AMD's paper specs and their actual offerings -- and I also knew that Nvidia had a huge edge at the cutting edge of things, if you need gigashaders or execution reordering or whatever, but ML isn't any of that. The calculations are "just" matrix multiplication, or not far off.

I would have thought AMD could have scrambled to fix their bugs, at least the matmul related ones, scrambled to shore up torch compatibility or whatever was needed for LLM training, and pushed something out the door that might not have been top-of-market but could at least have taken advantage of the opportunity provided by 80% margins from team green. I thought the green moat was maybe a year wide and tens of millions deep (enough for a team to test the bugs, a team to fix the bugs, time to ramp, and time to make it happen). But here we are, multiple years and trillions in market cap delta later, and AMD still seems to be completely non-viable. What happened? Did they go into denial about the bugs? Did they fix the bugs but the industry still doesn't trust them?

JonChesterfield · on June 25, 2024

It's roughly that the AMD tech works reasonably well on HPC and less convincingly on "normal" hardware/systems. So a lot of AMD internal people think the stack is solid because it works well on their precisely configured dev machines and on the commercially supported clusters.

Other people think it's buggy and useless because that's the experience on some other platforms.

This state of affairs isn't great. It could be worse but it could certainly be much better.

entropicdrifter · on June 25, 2024

If we're extremely lucky they might invest in SYCL and we'll see an Intel/AMD open-source teamup

sorenjan · on June 25, 2024

This seems like the option that would make the most sense. If developers can "write once, run everywhere", they might as well do that instead of Cuda. But if they have to "write once, run on Intel, or AMD, or Nvidia", why would they bother with anything other than Nvidia considering their market share? If you're an underdog you go for open standards that makes it easy to switch to your products, but it seems like AMD have seen Nvidia's Cuda and jealously decided they wanted their own version, but 15 years too late.

pbalcer · on June 25, 2024

> Qualcomm and other ARM manufacturers creaming them on AI stuff

That's mostly on Microsoft's DirectML though. I'm not sure whether AMD's implementation is based on ROCm (doubt it).

kd913 · on June 25, 2024

You do know that Microsoft, Oracle, Meta are all in on this right?

Heck I think it is being used to run ChatGPT 3.5 and 4 services.

softfalcon · on June 25, 2024

I feel like people forget that AMD has huge contracts with Microsoft, Valve, Sony, etc to design consoles at scale. It's an invisible provider as most folks don't even realize their Xbox and their Playstation are both AMD.

When you're providing fab designs at that scale, it makes a lot more sense to folks that companies would be willing to try a more affordable option to nVidia hardware.

My bet is that AMD figures out a service-able solution for some (not all) workloads that isn't ground breaking, but affordable to the clients that want an alternative. That's usually how this goes for AMD in my experience.

sangnoir · on June 25, 2024

If you read/listen to the Stratechary interview wirh Lisa Hsu, she spelled out being open ro customizing AMD hardware to meet partner's needs. So if Microsoft needs more memory bandwidth and less compute, AMD will build something just for them based on what they have now. If Meta wants 10% less power consumption (and cooling) for a 5% hit in compute, AMD will hear them out too. We'll see if that hardware customization strategy works outside of consoles.

rcxdude · on June 25, 2024

It certainly helps differentiate from NVIDIA's "Don't even think about putting our chips on a PCB we haven't vetted" approach.

pjmlp · on June 26, 2024

Yeah, but they will be using internal Microsoft and Meta software stacks, nothing that will dent CUDA.

Rinzler89 · on June 25, 2024

>I feel like people forget that AMD has huge contracts with Microsoft, Valve, Sony, etc to design consoles at scale.

Nobody forget that, just that those console chips are super low margins, which is why Intel and Nvidia stopped catering to that market after the Xbox/PS3 generations and only AMD took it up because they were broke and every penny mattered to them.

Nvidia did a brief stint with the Shield/Switch because they were trying to get into the Android/ARM space and also kinda gave up due to the margins.

pjmlp · on June 26, 2024

A market that keeps being discussed that is reaching its end, as newer generations aren't that much into traditional game consoles, and both Sony and Microsoft[0] have to reach out to PCs and mobile devices, to achieve sales growth.

Among the gamer community the discussion of this being the last generation keeps poping up.

[0] - Nintendo is more than happy to keep redoing their hit franchaises, in good enough hardware.

0cf8612b2e1e · on June 25, 2024

On the other hand, AMD has had a decade of watching CUDA eat their lunch and done basically nothing to change the situation.

bee_rider · on June 25, 2024

AMD tries to compete in hardware with Intel’s CPUs and Nvidia’s GPUs. They have to slack somewhere, and software seems to be where. It isn’t any surprise that they can’t keep up on every front, but it does mean they can freely bring in partners whose core competency is software and work with them without any caveats.

Not sure why they haven’t managed to execute on that yet, but the partners must be pretty motivated now, right? I’m sure they don’t love doing business at Nvidia’s leisure.

pjmlp · on June 26, 2024

Hardware is useless without software to make it show off.

bobsondugnut · on June 25, 2024

when was the last time AMD hardware was keeping up with NVIDIA? 2014?

0cf8612b2e1e · on June 25, 2024

Been a while since AMD had the top tier offering, but it has been trading blows in the middle tier segment the entire time. If you are just looking for a gamer card (ie not max AI performance), the AMD is typically cheaper and less power hungry than the equivalent Nvidia.

aurareturn · on June 25, 2024

It’s trading blows because AMD sells their cards at lower margins in the midrange and Nvidia lets them.

bee_rider · on June 26, 2024

But, the fact that Nvidia cards command higher margins also reflects their better software stack, right? Nvidia “lets them” trade blows in the midrange, or, equivalently, Nvidia is receiving the reward of their software investments: even their midrange hardware commands a premium.

bobsondugnut · on June 25, 2024

> the AMD is typically cheaper and less power hungry than the equivalent Nvidia

cheaper is true, but less power hungry is absolutely not true, which is kind of my point.

dralley · on June 25, 2024

It was true with RDNA 2. RDNA 3 regressed on this a bit, supposedly there was a hardware hiccup that prevented them from hitting frequency and voltage targets that they were hoping to reach.

In any case they're only slightly behind, not crazy far behind like Intel is.

bee_rider · on June 25, 2024

The MI300X sounds like it is competitive, haha

bobsondugnut · on June 25, 2024

competitive with H100 for inference. a 2 year old product on just one half of the ML story. H200 (and potentially B100) is the appropriate comparison based on their production in volume.

adabyron · on June 25, 2024

I have read in a few places that Microsoft is using AMD for inference to run ChatGPT. If I recall they said the price/performance was better.

I'm curious if that's just because they can't get enough Nvidia GPUs or if the price/performance is actually that much better.

atq2119 · on June 25, 2024

Most likely it really is better overall.

Think of it this way: AMD is pretty good at hardware, so there's no reason to think that the raw difference in terms of flops is significant in either direction. It may go in AMD's favor sometimes and Nvidia's other times.

What AMD traditionally couldn't do was software, so those AMD GPUs are sold at a discount (compared to Nvidia), giving you better price/performance if you can use them.

Surely Microsoft is operating GPUs at large enough scale that they can pay a few people to paper over the software deficiencies so that they can use the AMD GPUs and still end up ahead in terms of overall price/performance.