I think AMDs shady marketing where they claimed 1.4x over H100 is enough to just...

oelang · on Dec 15, 2023

If you're looking for fair comparisons don't ask nVidias marketing department, those guys are worse than Intel.

What AMD did was a true comparison, while nvidia is applying their transformer engine which modifies & optimizes some of the computation to FP8 & they claim no measurable change in output. So yes, nvidia has some software tricks left up on their sleeve and that makes comparisons hard, but the fact remains that their best hardware can't match the mi300x in raw power. Given some time, AMD can apply the same software optimizations, or one of their partners will.

I think AMD will likely hold the hardware advantage for a while, nVidia doesn't have any product that uses chiplets while AMD has been developing this technology for years. If the trend continues to have these huge AI chips, AMD has a better hand to economically scale their AI chips.

jz391 · on Dec 15, 2023

Not my area, but isn't a lot of NVIDIA's edge over AMD precisely software? NVIDIA seem to employ a lot of software dev (for a hardware company) & made CUDA into the de facto standard for much ML work. Do you know if AMD are closing that gap?

oelang · on Dec 15, 2023

They have improved their software significantly in the last year, but there is a movement that's broader than AMD that wants to get rid of CUDA.

The entire industry is motivated to break the nvidia monopoly. The cloud providers, various startups & established players like intel are building their own AI solutions. Simultaneously, CUDA is rarely used directly, typically a higher level (Python) API that can target any low-level API like cuda, PTX or rocm.

What AMD is lacking right now is decent support for rocm on their customer cards on all platforms. Right now if you don't have one of these MI cards or a rx7900 & you're not running linux you're not going to have a nice time. I believe the reason for this is that they have 2 different architectures, CDNA (the MI cards) and RDNA (the customer hardware).

bornfreddy · on Dec 15, 2023

> Right now if you don't have one of these MI cards or a rx7900 & you're not running linux you're not going to have a nice time.

Are you saying that having rx7900 + linux = happy path for ML? This is news to me, can you tell more?

I would love to escape cuda & high prices for nvidia gpus.

pbalcer · on Dec 15, 2023

That's what I have (RX 7900XT on Arch), and ROCm with pytorch has been reasonably stable so far. Certainly more than good enough for my experimentation. Pytorch itself has official support and things are pretty much plug & play.

deeviant · on Dec 15, 2023

> Given some time, AMD can apply the same software optimizations, or one of their partners will.

Except they have been given time, lots of it, and yet AMD is not anywhere close to parity with CUDA. It's almost like you can't just snap your figures and willy-nilly replicate the billions of dollars and decades investment that went into CUDA.

viewtransform · on Dec 15, 2023

That was a year ago. AMD is changing their software ecosystem at a rapid pace with AI software as a #1 priority. Experienced engineers have been reassigned from legacy projects to focus on AI software. They've bought a number of software startups that were already developing software in this space. It also looks like they replaced the previous AMD top level management with directors from Xilinx to reenergize the team.

To get a picture of the current state which has changed a lot, this MS Ignite presentation from three weeks ago may be of interest. The slides show the drop in compatibility they have for higher levels of the stack and the tools for translation at the lower levels. Finally there's a live demo at the end.

https://youtu.be/7jqZBTduhAQ?t=61

oelang · on Dec 15, 2023

The transformer engine is a fairly recent development (april this year I think) so I don't think they're very far behind.

dhruvdh · on Dec 15, 2023

The audacity to claim AMD's marketing is "shady" and then show a plot that compares queries/sec with AMD at batch size 1 with Nvidia at batch size 14.

Rename it to batch_size/sec if you don't see the issue.

deepnotderp · on Dec 15, 2023

Nvidia’s response is the shady one, in their response they’re using a different batch size