AMD GPU Debugger

mitchellh · 2025-12-08T19:16:35 1765221395

Non-AMD, but Metal actually has a [relatively] excellent debugger and general dev tooling. It's why I prefer to do all my GPU work Metal-first and then adapt/port to other systems after that: https://developer.apple.com/documentation/Xcode/Metal-debugg...

I'm not like a AAA game developer or anything so I don't know how it holds up in intense 3D environments, but for my use cases it's been absolutely amazing. To the point where I recommend people who are dabbling in GPU work grab a Mac (Apple Silicon often required) since it's such a better learning and experimentation environment.

I'm sure it's linked somewhere there but in addition to traditionally debugging, you can actually emit formatted log strings from your shaders and they show up interleaved with your app logs. Absolutely bonkers.

The app I develop is GPU-powered on both Metal and OpenGL systems and I haven't been able to find anything that comes near the quality of Metal's tooling in the OpenGL world. A lot of stuff people claim is equivalent but for someone who has actively used both, I strongly feel it doesn't hold a candle to what Apple has done.

billti · 2025-12-08T20:50:09 1765227009

It's a full featured and beautifully designed experience, and when it works it's amazing. However it regularly freezes of hangs for me, and I've lost count of the number of times I've had to 'force quit' Xcode or it's just outright crashed. Also, for anything non-trivial it often refuses to profile and I have to try to write a minimal repro to get it to capture anything.

I am writing compute shaders though, where one command buffer can run for seconds repeatedly processing over a 1GB buffer, and it seems the tools are heavily geared towards graphics work where the workload per frame is much lighter. (Will all the AI focus, hopefully they'll start addressing this use-case more).

mattbee · 2025-12-08T20:27:16 1765225636

My initiation into shaders was porting some graphics code from OpenGL on Windows to PS5 and Xbox, and (for your NDA and devkit fees) they give you some very nice debuggers on both platforms.

But yes, when you're stumbling around a black screen, tooling is everything. Porting bits of shader code between syntaxes is the easy bit.

Can you get better tooling on Windows if you stick to DirectX rather than OpenGL?

shetaye · 2025-12-08T17:30:34 1765215034

There also exists cuda-gdb[1], a first-party GDB for NVIDIA's CUDA. I've found it to be pretty good. Since CUDA uses a threading model, it works well with the GDB thread ergonomics (though you can only single-step at the warp granularity IIRC by the nature of SM execution).

[1] https://docs.nvidia.com/cuda/cuda-gdb/index.html

danjl · 2025-12-08T18:23:51 1765218231

For NVIDIA cards, you can use NSight. There's also RenderDoc that works on a large number of GPUs.

_zoltan_ · 2025-12-08T18:39:21 1765219161

nsys and nvtx are awesome.

many don't know but you can use them without GPUs :)

snarfy · 2025-12-08T17:28:52 1765214932

Is there not an official tool from AMD?

c2h5oh · 2025-12-08T17:37:38 1765215458

GDB supports it https://sourceware.org/gdb/current/onlinedocs/gdb.html/AMD-G...

You also get UMR from AMD https://gitlab.freedesktop.org/tomstdenis/umr

There is also a bunch of other tools provided: https://gpuopen.com/radeon-gpu-detective/ https://gpuopen.com/news/introducing-radeon-developer-tool-s...

slavik81 · 2025-12-08T19:29:55 1765222195

It's worth noting that upstream gdb (and clang) are somewhat limited in GPU debugging support because they only use (and emit) standardized DWARF debug information. The DWARF standard will need updates before gdb and clang can reach parity with the AMD forks, rocgdb and amdclang, in terms of debugging support. It's nothing fundamental, but the AMD forks use experimental DWARF features and the upstream projects do not.

It's a little out of date now, but Lance Six had a presentation about the state of AMD GPU debugging in upstream gdb at FOSDEM 2024. https://archive.fosdem.org/2024/events/attachments/fosdem-20...

thegeeko · 2025-12-08T19:47:50 1765223270

amd gdb is an actual debugger but it only works with applications that emit dwarf and use the amdkfd KMD aka it doesn't work with graphics .. all of the rest are not a actual debuggers .. UMR does support wave stepping but it doesn't try to be a shader debugger rather a tool for drivers developers and the AMD tools doesn't have any debugging capabilities.

almostgotcaught · 2025-12-08T17:42:43 1765215763

> After searching for solutions, I came across rocgdb, a debugger for AMD’s ROCm environment.

It's like the 3rd sentence in the blog post.......

djmips · 2025-12-08T17:57:32 1765216652

to be fair it wasn't clear that was an official AMD debugger and besides that's only for debugging ROCm applications.

almostgotcaught · 2025-12-08T19:23:31 1765221811

this sentence doesn't make any sense a) ROCm is an AMD product b) ROCm "applications" are GPU "applications".

fc417fc802 · 2025-12-08T19:47:51 1765223271

But not all GPU applications are ROCm applications (I would think).

I can certainly understand OP's confusion. Navigating parts of the GPU ecosystem that are new to you can be incredibly confusing.

thegeeko · 2025-12-08T19:50:58 1765223458

there's 2 AMD KMD(kernel mode drivers) in linux: amdkfd and amdgpu .. the graphics applications use the amdgpu which is not supported by amdgdb .. amdgdb also has the limitation of requiring dwarf and and mesa/amd UMDs doesn't generate that ..

whalesalad · 2025-12-08T18:30:40 1765218640

Tangent: is anyone using a 7900 XTX for local inference/diffusion? I finally installed Linux on my gaming pc, and about 95% of the time it is just sitting off collecting dust. I would love to put this card to work in some capacity.

jjmarr · 2025-12-08T20:55:03 1765227303

I've been using it for a few years on Gentoo. There were challenges with Python 2 years ago, but over the past year it's stabilized and I can even do img2video.

Performance-wise, the 7900 xtx is still the most cost effective way of getting 24 gigabytes that isn't a sketchy VRAM mod. And VRAM is the main performance barrier since any LLM is going to barely fit in memory.

Highly suggest checking out TheRock. There's been a big rearchitecting of ROCm to improve the UX/quality.

Full disclosure, I work at AMD, but making this comment in my personal capacity.

Gracana · 2025-12-08T20:16:58 1765225018

I bought one when they were pretty new and I had issues with rocm (iirc I was getting kernel oopses due to GPU OOMs) when running LLMs. It worked mostly fine with ComfyUI unless I tried to do especially esoteric stuff. From what I've heard lately though, it should work just fine.

Joona · 2025-12-08T19:10:37 1765221037

I tested some image and text generation models, and generally things just worked after replacing the default torch libraries with AMD's rocm variants.

qskousen · 2025-12-08T19:01:36 1765220496

I've done it with a 6800XT, which should be similar. It's a little trickier than with an Nvidia card (because everything is designed for CUDA) but doable.

FuriouslyAdrift · 2025-12-08T19:02:18 1765220538

You'd be much better off wiht any decent nVidia against the 7900 series.

AMD doesn't have a unified architecture across GPU and compute like nVidia.

AMD compute cards are sold under the Insinct line and are vastly more powerfull than their GPUs.

Supposedly, they are moving back to a unified architecture in the next generation of GPU cards.

universa1 · 2025-12-08T19:46:37 1765223197

try it with ramalama[1]. worked fine here with a 7840u and a 6900xt.

[1] https://ramalama.ai/