If you're looking for fair comparisons don't ask nVidias marketing department, those guys are worse than Intel.
What AMD did was a true comparison, while nvidia is applying their transformer engine which modifies & optimizes some of the computation to FP8 & they claim no measurable change in output. So yes, nvidia has some software tricks left up on their sleeve and that makes comparisons hard, but the fact remains that their best hardware can't match the mi300x in raw power. Given some time, AMD can apply the same software optimizations, or one of their partners will.
I think AMD will likely hold the hardware advantage for a while, nVidia doesn't have any product that uses chiplets while AMD has been developing this technology for years. If the trend continues to have these huge AI chips, AMD has a better hand to economically scale their AI chips.
Not my area, but isn't a lot of NVIDIA's edge over AMD precisely software? NVIDIA seem to employ a lot of software dev (for a hardware company) & made CUDA into the de facto standard for much ML work. Do you know if AMD are closing that gap?
They have improved their software significantly in the last year, but there is a movement that's broader than AMD that wants to get rid of CUDA.
The entire industry is motivated to break the nvidia monopoly. The cloud providers, various startups & established players like intel are building their own AI solutions. Simultaneously, CUDA is rarely used directly, typically a higher level (Python) API that can target any low-level API like cuda, PTX or rocm.
What AMD is lacking right now is decent support for rocm on their customer cards on all platforms. Right now if you don't have one of these MI cards or a rx7900 & you're not running linux you're not going to have a nice time. I believe the reason for this is that they have 2 different architectures, CDNA (the MI cards) and RDNA (the customer hardware).
That's what I have (RX 7900XT on Arch), and ROCm with pytorch has been reasonably stable so far. Certainly more than good enough for my experimentation. Pytorch itself has official support and things are pretty much plug & play.
> Given some time, AMD can apply the same software optimizations, or one of their partners will.
Except they have been given time, lots of it, and yet AMD is not anywhere close to parity with CUDA. It's almost like you can't just snap your figures and willy-nilly replicate the billions of dollars and decades investment that went into CUDA.
That was a year ago. AMD is changing their software ecosystem at a rapid pace with AI software as a #1 priority. Experienced engineers have been reassigned from legacy projects to focus on AI software. They've bought a number of software startups that were already developing software in this space. It also looks like they replaced the previous AMD top level management with directors from Xilinx to reenergize the team.
To get a picture of the current state which has changed a lot, this MS Ignite presentation from three weeks ago may be of interest. The slides show the drop in compatibility they have for higher levels of the stack and the tools for translation at the lower levels. Finally there's a live demo at the end.
Summary: they cherrypicked legacy nvidia sdk's and used llama batch sizes that are not used often in production...
https://twitter.com/karlfreund/status/1735078641631998271
https://developer.nvidia.com/blog/achieving-top-inference-pe...