Given the number of people who need the compute but are only accessing it via APIs like HuggingFace's transformers library, which supports these chips, I don't really think that CUDA support is absolutely essential.
Most kernels are super quick to rewrite, and higher level abstractions like PyTorch and JAX make dealing with CUDA a pretty rare experience for most people making use of large clusters and small installs. And if you have the money to build a big cluster, you can probably also hire the engineers to port your framework to the right AMD library.
The world has changed a lot!
The bigger challenge is that if you are starting up, why in the world would you give yourself the additional challenge of going off the beaten path? Its not just CUDA but the whole infrastructure of clusters and networking that really gives NVIDIA an edge, in addition to knowing that they are going to stick around in the market, whereas AMD might leave it tomorrow.
When buying a supercomputer, you negotiate support contracts so it doesn't matter if AMD leaves the day after they sign the contract, you've still got your supercomputer and support for it.
Most kernels are super quick to rewrite, and higher level abstractions like PyTorch and JAX make dealing with CUDA a pretty rare experience for most people making use of large clusters and small installs. And if you have the money to build a big cluster, you can probably also hire the engineers to port your framework to the right AMD library.
The world has changed a lot!
The bigger challenge is that if you are starting up, why in the world would you give yourself the additional challenge of going off the beaten path? Its not just CUDA but the whole infrastructure of clusters and networking that really gives NVIDIA an edge, in addition to knowing that they are going to stick around in the market, whereas AMD might leave it tomorrow.