I found the introductory chapters (1-3) of this book[0] quite good. It is different from the NVIDIA CUDA C++ guide in that it uses modern C++ and has non-trivial real-world examples.
I also wrote a blog post [1] exploring CUDA to write a simple CNN inference module which you might find useful.
I kind of feel there are two different levels of sw development for CUDA.
1. CUDA level programming for graphic processing etc, or writing a c/c++ library for Pytorch/tensorflow framework.
2. Pytorch/Tensorflow level coding(e.g. training a model), you just pick their CUDA-specific APIs, and the two frameworks handle the rest under the hood, no CUDA specific coding details(point 1 above) is required from the users as far as I can tell.
if you're interested in 1, Nvidia has c++ guide to download, if you're interested in 2, then the focus is on the AI framework instead of CUDA.
I've always found writing CUDA kernels to be a bit unapproachable.