I've been looking for a course like this! Especially great given how much of the recent progress in training large models is made possible with the aid of flash attention and fused kernels
It certainly also involves generating code(e.g. WebGPU, vulkan) that are more akin to traditionally compiler, and more like graph and memory optimization. So indeed more than packaging.