AMD gatekeeps this functionality behind it's non-consumer cards. They don't realize that having a consumer card and being able to develop on it is a gateway to using AMD. I can use CUDA on any Nvidia card I buy. I can't believe they are so incredibly dense on this.
You can run inference and training on consumer AMD cards today. It works fine, including llama.cpp, stable diffusion, hugging face transformers, etc. Way cheaper for a given performance/VRAM target as well.
With Nvidia cards, I know that if I buy any Nvidia card made in the last 10 years, CUDA code will run on it. Period. (Yes, different language levels require newer hardware, but Nvidia docs are quite clear about which CUDA versions require which silicon.) I have an AMD Zen3 APU with a tiny Vega in it; I ought to be able to mess around with HIP with ~zero fuss.
The will-they-won't-they and the rapidly dropped support is hurting the otherwise excellent ROCm and HIP projects. There is a huge API surface to implement and it looks like they're making rapid gains.
The article specifically is about AI. Don't most useful LLM models require too much RAM for consumer Nvidia cards and also often need those newer features, making it irrelevant that a G80 could run some sort of cuda code?
I'm not particularly optimistic that ecosystem support will ever pan out for AMD to be viable but this seems to be giving a bit too much credit to Nvidia for democratizing AI development, which is a stretch.
First of all, LLMs are not the only AI in existence. A lot of ML, stats, and compute can be run on consumer grade GPUs. There are plenty of problems that aren't even applicable with an LLM.
Second, you absolutely can run and fine tune many open source LLMs on one or more 3090s at a time..
But being able just to tinker, learn to write code, etc.. on a consumer GPU is a gateway to the more compute focused cards.
I 100% agree with that. The override envar (HSA_OVERRIDE_GFX_VERSION) is also buried deep in their documentation. NVIDIA is eating AMD's breakfast with GTX3060s while they are trying to peddle 7900XTs.
Pretty sure my Radeon R9-285 would work if I force gfx802 offload arch when building for ROCm, but...what are you going to do with decade-old VRAM support? 2gb is not enough for anybody.
That's something that's started changing over the last few months. Official support for the RX 7900 GPUs for Linux has been added to the most recent versions of ROCm and over on the ROCm subreddit people are reporting success getting other RDNA 3 cards working. On Windows you've got consumer cards from the previous generation getting official support too.
This is, obviously, way overdue and it might not be enough to let AMD get back into the race but
In my eyes, the real problem is that there is no cost effective developer access to high end cards, like the MI300x. This breaks the developer flywheel that you would normally point at consumer cards for.
Where can you rent time on one? Traditionally, AMD has only helped build super computers, like Frontier and El Capitan, out of these cards.
This time around Azure [0] and other CSP's (cloud service providers) are working to change that. I will have the best of the best of their cards/systems for rent soon.
lol, this is so stupid. Don't they realise that people usually develop locally and train on a server? You don't need a super beefy GPUs to do it, so you buy nvidia. So people are used to nvidia, debug and fix bugs etc. It's not a very smart decision, looks like the decision makers have no idea what's going on.
AMD’s opencl runtime also historically has an incredible number of bugs and paper features that make any sort of portability between nvidia and AMD cards quite difficult - you are running a special build and a different codepath for AMD anyway, there was no gain from using the ostensibly portable approach.
The gain is that you can't find time on NVIDIA cards right now. Decentralizing away from a single provider lowers the risk on your business significantly.
Case in point, OpenAI closed new signups because they couldn't keep up with demand and they literally have all the resources in the world to make things happen.
No I didn't read it fully, I knew what the concept was called, found an article, skimmed it until I was sure it was talking about the thing I thought it was. I'm almost positive I was taught in school that "it's" is a valid possessive apostrophe case and honestly I think it's stupid that it's not. I find "its" more confusing personally.
>I knew what the concept was called, found an article, skimmed it until I was sure it was talking about the thing I thought it was
You just managed to summarise why having conversations with strangers is so difficult on the internet these days.
Instead of considering that you were incorrect, even for a moment, you sought an article that you thought would confirm the ideas that you already had. Without even actually reading it, you used it as evidence that you were correct all along.
Even though the article very clearly illustrates that you were mistaken.
You do understand that's not really what happened here right?
The concept IS called possessive apostrophe and literally until this minute I wasn't aware that "it's" isn't grammatically correct when the "it" in the sentence is being used possessively. I didn't just find an article I thought agreed with me and fire it off, I thought "it's" was valid and riffic didn't know about possessive apostrophes (which again, I had wrong in this case). I didn't look for "it's" in the article because that wasn't up for debate in my mind (again, I can't stress this enough, I was wrong about the usage, any comment I've ever made uses "it's" in this case because that's how I thought it was used), I was just looking for the concept as a whole to link. I got a minor piece wrong and you want to lump me in with everyone who picks the first article that "agrees" with them.
I'm not lumping you in with anyone, I merely commented on how you sent a link because you thought it advanced your position when it actually advanced the position of the person you were "correcting". It's a behaviour I see all the time now, from all sorts of people.
I certainly don't think less of you specifically for doing this.
I generally remember this bit of grammar as the apostrophe replacing the missing letters of the contraction (which is not needed for the possessive situation).
That is the confusing part say you need to hire 100 people at 1 mil. per year comp to get drivers to a good state. Thats 1/3 of their quarterly profit but would prob double the revenue in a few years.
From what I've heard they aren't offering competitive salaries, so introducing people at insane comps would probably destroy existing teams(if there was no salary bumps to match)/budgets (if there was). Doubt you can do much with new hires in 1 year in such environment, by the time you start seeing results it's probably too late to capitalize on this bubble.
More likely, they wait to see how the AI HW startups shake up and then acquire the ones that have anything worth paying for.
>From what I've heard they aren't offering competitive salaries
Seems like a common thing in hardware companies, they chronically underpay, which for some reason hardware/electrical engineers seem to accept, but that makes them a last-choice for competent software engineers, who have much better-paying options.
Even if they could make a fab, it would still be a logistical nightmare to scale from 1-1,000 users. Meanwhile, my SaaS company could have 100,000 users thanks to the cloud, and I wouldn't even have to get up from my desk.
Non-sequitor [1]. AMD not having a fab doesn't mean a EE can run a fab out of their garage.
It may be relatively easier for people to make new chips without needing AMD/Intel (see all of fang company making their own). But it's still companies with lots of money making new chips and not people in their garage.
It's not just drivers though. Nvidia has invested close to two decades into documentation, teaching materials, developer tooling and libraries for CUDA, plus all the work on gaining mindshare.
You could probably get 80% there by dedicating enough AMD developers to improving AMD support in existing AI frameworks and software, in parallel with improving drivers and whatever CUDA equivalent they are betting on right now. But it would need a massive concerted effort that few companies seem to be able to pull off (probably it's hard to align the company on the right goals)
NVidia is a semico and they pay much better than AMD: https://www.indeed.com/companies/compare/Amd-vs-Nvidia-b78a5... . That's a big part of why they're so far ahead now and their drivers are much better. If AMD wants to catch up to NVidia within a reasonable timeframe, they wouldn't just need to match it, they'd need to pay way better than NVidia to attract the best people.
That's not such a big concern, LLMs run on all things today. It won't be that hard to make them work on AMD. Before 2020 we had much more architectural diversity.
Whats insane is that AMD has been known for it's shit drivers for over a decade now...and nothing has happened to address this. Like surely everyone internally knows it, all the execs know it, the board knows it, investors know it...but somehow it has never been addressed.
At this point it's almost like it has to be intentional, like some perceived tradeoff ingrained in the culture that generates shit software.
>At this point it's almost like it has to be intentional, like some perceived tradeoff ingrained in the culture that generates shit software.
They're underpaying their hardware engineers, and if they wanted to hire good software engineers they'd need to pay more, which would cause their hardware engineers to demand better pay too.
> The AMD Instinct M1300A APU was launched in January 2023 and blends a total of 13 chiplets, of which many are 3D stacked, creating a single chip package with 24 Zen 4 CPU cores fused with a CDNA 3 graphics engine and eight stacks of HBM3 memory totaling 128GB.
Its literally a typo (or renamed SKU?) for the MI300A. So... the street is jumping on AMD because of a typo echoed by a ton of outlets?
I can attest that it works really well on my 6900xt. Compiling CUDA kernels is merely a matter of a using a #define shim. Also, provided you download the ROCm pytorch (and force compatibility with the HSA_OVERRIDE_GFX_VERSION) everything just works.
You can also run Stable Diffusion in cpu mode, if you don't mind it being slower. I have an NVIDIA card but it's not powerful enough to run it. I'm on Ubuntu.
I don't know, but couldn't people use LLMs to drastically lower the cost of switching? Converting a codebase to use a different platform doesn't require creativity.
I think we shouldnt even try, because LLMs will simply design new hardware and software all by its own. All we need to do is sit and watch, while collecting UBI.
The stated claim was "According to the news and tech blogs AMD is always ahead of nvidia in all regards." Which necessitates sources that aren't the parent article being discussed. Because "always" and "in all regards" were invoked, not just AI
> The other side of this claim: sales numbers of GPUs.
Your claim "But then, you have real life..." implies that the news sites you didn't cite are wrong in claiming AMD technical supremacy, not that sales numbers are off.
To restate the question: do you have news references stating that AMD is always and in all regards ahead of nvidia? And then do you have real life data points to prove that said news sites are wrong?