AMD leaps after launching AI chip that could challenge Nvidia dominance

nightski · on Dec 7, 2023

AMD gatekeeps this functionality behind it's non-consumer cards. They don't realize that having a consumer card and being able to develop on it is a gateway to using AMD. I can use CUDA on any Nvidia card I buy. I can't believe they are so incredibly dense on this.

65a · on Dec 7, 2023

You can run inference and training on consumer AMD cards today. It works fine, including llama.cpp, stable diffusion, hugging face transformers, etc. Way cheaper for a given performance/VRAM target as well.

thrtythreeforty · on Dec 7, 2023

Maybe so. But it isn't confidence inspiring when I go to see which cards are supported and I see this issue:

https://github.com/ROCm/ROCm/issues/1714

With Nvidia cards, I know that if I buy any Nvidia card made in the last 10 years, CUDA code will run on it. Period. (Yes, different language levels require newer hardware, but Nvidia docs are quite clear about which CUDA versions require which silicon.) I have an AMD Zen3 APU with a tiny Vega in it; I ought to be able to mess around with HIP with ~zero fuss.

The will-they-won't-they and the rapidly dropped support is hurting the otherwise excellent ROCm and HIP projects. There is a huge API surface to implement and it looks like they're making rapid gains.

Symmetry · on Dec 7, 2023

That's from 2022. AMDs move to start generally supporting consumer cards is very recent.

echelon · on Dec 7, 2023

Where's the official show of support? I'll believe it when I see it.

Symmetry · on Dec 7, 2023

They're listed as supported on their website and they work. I'm not sure what there is besides that.

https://rocm.docs.amd.com/en/latest/release/gpu_os_support.h...

You have to click on the "Radeon" tab for the commercial cards.

Yes, it's annoying that they only officially support Ubuntu 22.04 but it is official support and you can get other OSs and cards to work.

anuraaga · on Dec 8, 2023

The article specifically is about AI. Don't most useful LLM models require too much RAM for consumer Nvidia cards and also often need those newer features, making it irrelevant that a G80 could run some sort of cuda code?

I'm not particularly optimistic that ecosystem support will ever pan out for AMD to be viable but this seems to be giving a bit too much credit to Nvidia for democratizing AI development, which is a stretch.

nightski · on Dec 8, 2023

First of all, LLMs are not the only AI in existence. A lot of ML, stats, and compute can be run on consumer grade GPUs. There are plenty of problems that aren't even applicable with an LLM.

Second, you absolutely can run and fine tune many open source LLMs on one or more 3090s at a time..

But being able just to tinker, learn to write code, etc.. on a consumer GPU is a gateway to the more compute focused cards.

zamalek · on Dec 7, 2023

There's a difference between officially supported, and supported. My 6900XT, an unsupported card, works just fine.

thrtythreeforty · on Dec 7, 2023

Then they should indicate that! Putting me off from considering an AMD card for purchase is very detrimental to building a userbase.

zamalek · on Dec 7, 2023

I 100% agree with that. The override envar (HSA_OVERRIDE_GFX_VERSION) is also buried deep in their documentation. NVIDIA is eating AMD's breakfast with GTX3060s while they are trying to peddle 7900XTs.

65a · on Dec 8, 2023

Pretty sure my Radeon R9-285 would work if I force gfx802 offload arch when building for ROCm, but...what are you going to do with decade-old VRAM support? 2gb is not enough for anybody.

Symmetry · on Dec 7, 2023

That's something that's started changing over the last few months. Official support for the RX 7900 GPUs for Linux has been added to the most recent versions of ROCm and over on the ROCm subreddit people are reporting success getting other RDNA 3 cards working. On Windows you've got consumer cards from the previous generation getting official support too.

This is, obviously, way overdue and it might not be enough to let AMD get back into the race but

latchkey · on Dec 7, 2023

In my eyes, the real problem is that there is no cost effective developer access to high end cards, like the MI300x. This breaks the developer flywheel that you would normally point at consumer cards for.

Where can you rent time on one? Traditionally, AMD has only helped build super computers, like Frontier and El Capitan, out of these cards.

This time around Azure [0] and other CSP's (cloud service providers) are working to change that. I will have the best of the best of their cards/systems for rent soon.

[0] https://techcommunity.microsoft.com/t5/azure-high-performanc...

LeanderK · on Dec 7, 2023

lol, this is so stupid. Don't they realise that people usually develop locally and train on a server? You don't need a super beefy GPUs to do it, so you buy nvidia. So people are used to nvidia, debug and fix bugs etc. It's not a very smart decision, looks like the decision makers have no idea what's going on.

paulmd · on Dec 7, 2023

AMD’s opencl runtime also historically has an incredible number of bugs and paper features that make any sort of portability between nvidia and AMD cards quite difficult - you are running a special build and a different codepath for AMD anyway, there was no gain from using the ostensibly portable approach.

latchkey · on Dec 7, 2023

The gain is that you can't find time on NVIDIA cards right now. Decentralizing away from a single provider lowers the risk on your business significantly.

Case in point, OpenAI closed new signups because they couldn't keep up with demand and they literally have all the resources in the world to make things happen.

riffic · on Dec 7, 2023

joshstrange · on Dec 7, 2023

It's a possessive apostrophe: https://www.grammarly.com/blog/possessive-apostrophe/

skyyler · on Dec 7, 2023

Did you read that article you linked?

>The important thing to remember is don’t use possessive apostrophes with any pronouns, either possessive pronouns or possessive adjectives.

>If you see an apostrophe with a pronoun, it must be part of a contraction.

>its—possessive adjective of it

>it’s—contraction for “it is”

"its" would be correct in the root comment.

joshstrange · on Dec 7, 2023

No I didn't read it fully, I knew what the concept was called, found an article, skimmed it until I was sure it was talking about the thing I thought it was. I'm almost positive I was taught in school that "it's" is a valid possessive apostrophe case and honestly I think it's stupid that it's not. I find "its" more confusing personally.

skyyler · on Dec 7, 2023

>I knew what the concept was called, found an article, skimmed it until I was sure it was talking about the thing I thought it was

You just managed to summarise why having conversations with strangers is so difficult on the internet these days.

Instead of considering that you were incorrect, even for a moment, you sought an article that you thought would confirm the ideas that you already had. Without even actually reading it, you used it as evidence that you were correct all along.

Even though the article very clearly illustrates that you were mistaken.

Fascinating.

joshstrange · on Dec 7, 2023

You do understand that's not really what happened here right?

The concept IS called possessive apostrophe and literally until this minute I wasn't aware that "it's" isn't grammatically correct when the "it" in the sentence is being used possessively. I didn't just find an article I thought agreed with me and fire it off, I thought "it's" was valid and riffic didn't know about possessive apostrophes (which again, I had wrong in this case). I didn't look for "it's" in the article because that wasn't up for debate in my mind (again, I can't stress this enough, I was wrong about the usage, any comment I've ever made uses "it's" in this case because that's how I thought it was used), I was just looking for the concept as a whole to link. I got a minor piece wrong and you want to lump me in with everyone who picks the first article that "agrees" with them.

skyyler · on Dec 7, 2023

I'm not lumping you in with anyone, I merely commented on how you sent a link because you thought it advanced your position when it actually advanced the position of the person you were "correcting". It's a behaviour I see all the time now, from all sorts of people.

I certainly don't think less of you specifically for doing this.

Thanks for being open to discussion, though!

metabagel · on Dec 7, 2023

> "I got a minor piece wrong"

You were entirely wrong.

I generally remember this bit of grammar as the apostrophe replacing the missing letters of the contraction (which is not needed for the possessive situation).

mft_ · on Dec 7, 2023

Thank you for owning the mistake - you're already ahead of many people on the internet :)

riffic · on Dec 7, 2023

just sound it out in your head. If you see an apostrophe with it's, it usually means "it is"

Otherwise someone oopsed and you can leave a dumb grammar comment on HN

HDThoreaun · on Dec 7, 2023

You’re thinking of its’

riffic · on Dec 7, 2023

Please prove that is a valid construction.

riffic · on Dec 7, 2023

that's nonstandard though, lol.

it's means it is or it has

FredPret · on Dec 7, 2023

AMD needs to launch drivers, not chips. Who’s going to develop ML models on a card that can’t connect to a popular ML model?

I look at their financial performance and it’s staggering how they’ve missed the boat - and this is during a huge boom on gaming, crypto, and AI.

Compare:

https://valustox.com/AMD

VS

https://valustox.com/NVDA

qaq · on Dec 7, 2023

That is the confusing part say you need to hire 100 people at 1 mil. per year comp to get drivers to a good state. Thats 1/3 of their quarterly profit but would prob double the revenue in a few years.

rafaelmn · on Dec 7, 2023

From what I've heard they aren't offering competitive salaries, so introducing people at insane comps would probably destroy existing teams(if there was no salary bumps to match)/budgets (if there was). Doubt you can do much with new hires in 1 year in such environment, by the time you start seeing results it's probably too late to capitalize on this bubble.

More likely, they wait to see how the AI HW startups shake up and then acquire the ones that have anything worth paying for.

logicchains · on Dec 7, 2023

>From what I've heard they aren't offering competitive salaries

Seems like a common thing in hardware companies, they chronically underpay, which for some reason hardware/electrical engineers seem to accept, but that makes them a last-choice for competent software engineers, who have much better-paying options.

lesuorac · on Dec 7, 2023

> which for some reason

Didn't you just explicitly say the reason? A SWE can go off and make Google in their Gargage; a EE can't make a fab in their garage.

fragmede · on Dec 7, 2023

Even if they could make a fab, it would still be a logistical nightmare to scale from 1-1,000 users. Meanwhile, my SaaS company could have 100,000 users thanks to the cloud, and I wouldn't even have to get up from my desk.

djmips · on Dec 7, 2023

AMD doesn't even own a fab.

lesuorac · on Dec 7, 2023

Non-sequitor [1]. AMD not having a fab doesn't mean a EE can run a fab out of their garage.

It may be relatively easier for people to make new chips without needing AMD/Intel (see all of fang company making their own). But it's still companies with lots of money making new chips and not people in their garage.

[1]: https://en.wikipedia.org/w/index.php?title=Non_sequitur_(fal...

p_j_w · on Dec 7, 2023

They own licenses to the insanely expensive software you need to build a modern IC.

xchip · on Dec 7, 2023

I heard the same too (speaking for a friend)

wongarsu · on Dec 7, 2023

It's not just drivers though. Nvidia has invested close to two decades into documentation, teaching materials, developer tooling and libraries for CUDA, plus all the work on gaining mindshare.

You could probably get 80% there by dedicating enough AMD developers to improving AMD support in existing AI frameworks and software, in parallel with improving drivers and whatever CUDA equivalent they are betting on right now. But it would need a massive concerted effort that few companies seem to be able to pull off (probably it's hard to align the company on the right goals)

hardware2win · on Dec 7, 2023

1mil comp? Haha, what?

Salaries at semico companies are not even close to this

Also why would you even need ppl this good? People who earn 1 mil offer way, way more than just tech skills

logicchains · on Dec 7, 2023

NVidia is a semico and they pay much better than AMD: https://www.indeed.com/companies/compare/Amd-vs-Nvidia-b78a5... . That's a big part of why they're so far ahead now and their drivers are much better. If AMD wants to catch up to NVidia within a reasonable timeframe, they wouldn't just need to match it, they'd need to pay way better than NVidia to attract the best people.

djmips · on Dec 7, 2023

Agreed - Nvidia has made software a core pillar since the nineties.

visarga · on Dec 7, 2023

That's not such a big concern, LLMs run on all things today. It won't be that hard to make them work on AMD. Before 2020 we had much more architectural diversity.

FredPret · on Dec 7, 2023

The last time I tried getting Tensorflow / Pytorch to work on my good (and cheap) AMD card, I literally ended up just buying an Nvidia card.

I’m just one guy but my experience carries over to subsequent business decisions made by me, and there are many like me.

Workaccount2 · on Dec 7, 2023

I'll throw my hat in the ring, I didn't buy an nvidia card, I simply gave up on trying to get it working.

FredPret · on Dec 7, 2023

Nvidia drivers on Linux aren’t a walk in the park either, but at least there’s a way to do it.

dharma1 · on Dec 7, 2023

they've had a long time to get ML working as well as with CUDA and CuDNN, mostly failed attempts so far. How close are they?

bigbillheck · on Dec 7, 2023

> AMD needs to....

https://www.amd.com/en/newsroom/press-releases/2023-10-10-am...

Workaccount2 · on Dec 7, 2023

Whats insane is that AMD has been known for it's shit drivers for over a decade now...and nothing has happened to address this. Like surely everyone internally knows it, all the execs know it, the board knows it, investors know it...but somehow it has never been addressed.

At this point it's almost like it has to be intentional, like some perceived tradeoff ingrained in the culture that generates shit software.

logicchains · on Dec 7, 2023

>At this point it's almost like it has to be intentional, like some perceived tradeoff ingrained in the culture that generates shit software.

They're underpaying their hardware engineers, and if they wanted to hire good software engineers they'd need to pay more, which would cause their hardware engineers to demand better pay too.

FredPret · on Dec 7, 2023

The price of underpaying their employees like this can be seen very clearly in their revenue graph.

machinekob · on Dec 7, 2023

They are not only missed AI boom, they are also seems to be overpriced by a huge margin compared to Intel or even NVDA.

brucethemoose2 · on Dec 7, 2023

Wait... this headline totally wrong.

This is the actual source[1]:

> The AMD Instinct M1300A APU was launched in January 2023 and blends a total of 13 chiplets, of which many are 3D stacked, creating a single chip package with 24 Zen 4 CPU cores fused with a CDNA 3 graphics engine and eight stacks of HBM3 memory totaling 128GB.

Its literally a typo (or renamed SKU?) for the MI300A. So... the street is jumping on AMD because of a typo echoed by a ton of outlets?

https://www.datacenterdynamics.com/en/news/genci-upgrades-ad...

jacoblambda · on Dec 7, 2023

It is a typo but in a different way. They are referring to the MI300X (mentioned again later).

The discussion on the MI300X was on HN like 12 hours ago (after the AMD announcement event yesterday):

https://news.ycombinator.com/item?id=38550271

https://www.youtube.com/watch?v=tfSZqjxsr0M

brucethemoose2 · on Dec 7, 2023

That's technically not new either?

Are they just talking about MI300X availability?

jacoblambda · on Dec 7, 2023

It very much is though. the MI300X may have been known about prior to the event but specs and performance were all under embargo until December 6.

w-m · on Dec 7, 2023

From yesterday: AMD MI300 performance – Faster than H100, but how much?

https://news.ycombinator.com/item?id=38550271

xnx · on Dec 7, 2023

> Advanced Micro Devices shares were marked 2% higher in premarket trading

2% is a "leap"?

It looks like NVDA is up ~1.5% since yesterday.

jraby3 · on Dec 7, 2023

They are up 6% today which is significant.

DeathArrow · on Dec 7, 2023

Do they have a good CUDA alternative?

no_wizard · on Dec 7, 2023

Can't speak to how it compares to CUDA however they are developing ROCm[0]

[0]: https://www.amd.com/en/products/software/rocm.html

zamalek · on Dec 7, 2023

I can attest that it works really well on my 6900xt. Compiling CUDA kernels is merely a matter of a using a #define shim. Also, provided you download the ROCm pytorch (and force compatibility with the HSA_OVERRIDE_GFX_VERSION) everything just works.

echelon · on Dec 7, 2023

Next to nothing is written that targets this. Not Stable Diffusion, not RVC, not Vall-E, not Tacotron, not Tortoise, ...

Maybe the LLM space is better about this, but the generative media side definitely isn't.

AMD has a market share of 0% here, and nobody publishes models with AMD support.

Symmetry · on Dec 7, 2023

The things is you can actually run Stable Diffusion.

And I got PyTorch working on my AMD 7900 XT graphics card recently, though it was a bit of a hassle to do so.

godshatter · on Dec 7, 2023

You can also run Stable Diffusion in cpu mode, if you don't mind it being slower. I have an NVIDIA card but it's not powerful enough to run it. I'm on Ubuntu.

echelon · on Dec 7, 2023

> though it was a bit of a hassle to do so.

Incredible understatement. And the diverse set of community tools also breaks down.

We're still a year or more out from proper AMD support in the ecosystem.

coffeebeqn · on Dec 7, 2023

Could there be a compiler/transpiler from CUSA to whatever AMD is pushing ?

puzzlingcaptcha · on Dec 7, 2023

That's what https://github.com/ROCm/HIPIFY is (as a part of ROCm)

zamalek · on Dec 7, 2023

I think it's llama.cpp that simply #defines all cuda_ functions to rocm_ (99% name-name). Porting seems to be that trivial.

dekken_ · on Dec 7, 2023

it's called hip, and it's mostly the same

AMD have their own thrust gpu impls, so from a high level they are somewhat interchangeable

2OEH8eoCRo0 · on Dec 7, 2023

I don't know, but couldn't people use LLMs to drastically lower the cost of switching? Converting a codebase to use a different platform doesn't require creativity.

gumballindie · on Dec 7, 2023

I think we shouldnt even try, because LLMs will simply design new hardware and software all by its own. All we need to do is sit and watch, while collecting UBI.

machinekob · on Dec 7, 2023

TheBigSalad · on Dec 7, 2023

It barely moved. This stock goes up and down 5% all the time. It was at the current price a week ago.

ChrisArchitect · on Dec 7, 2023

Some more discussion yesterday: https://news.ycombinator.com/item?id=38548330

cloudengineer94 · on Dec 7, 2023

They gotta keep working on the software side of things like drivers, FSR and more.

AMD is so far behind on this.

In the other hand though, their 3D Cache chips are amazing

bryanlarsen · on Dec 7, 2023

On Linux, AMD drivers are far ahead of Nvidia. Switching from a 970 to a 3750 fixed a bunch of visual glitches for me.

riffic · on Dec 7, 2023

msn.com, really?

ek750 · on Dec 8, 2023

what would you prefer? Would WSJ or other paywalled sites make you happy?

Don't make me start submitting aol links.

machinekob · on Dec 7, 2023

This time AMD will win for sure (Copium)

mupuff1234 · on Dec 7, 2023

AMD market cap is bigger than Intel.

gafage · on Dec 7, 2023

According to the news and tech blogs AMD is always ahead of nvidia in all regards. But then, you have real life...

bryant · on Dec 7, 2023

> According to the news and tech blogs AMD is always ahead of nvidia in all regards. But then, you have real life...

Do you have examples of both sides of this claim?

gafage · on Dec 7, 2023

One side of this claim: https://www.msn.com/en-us/lifestyle/shopping/amd-leaps-after...

The other side of this claim: sales numbers of GPUs.

bryant · on Dec 7, 2023

> One side of this claim: https://www.msn.com/en-us/lifestyle/shopping/amd-leaps-after...

The stated claim was "According to the news and tech blogs AMD is always ahead of nvidia in all regards." Which necessitates sources that aren't the parent article being discussed. Because "always" and "in all regards" were invoked, not just AI

> The other side of this claim: sales numbers of GPUs.

Your claim "But then, you have real life..." implies that the news sites you didn't cite are wrong in claiming AMD technical supremacy, not that sales numbers are off.

To restate the question: do you have news references stating that AMD is always and in all regards ahead of nvidia? And then do you have real life data points to prove that said news sites are wrong?