it is true, but also not. nvidia is certainly producing a chip that nobody else can replicate (unless they're the likes of google, and even they are not interested in doing so).
The CUDA moat is the same type of moat as intel's x86 instruction set. Plenty of existing programs/software stack have been written against it, and the cost to migrate away is high. These LLM pipelines are similar, and even more costly to migrate away.
But because LLM is still immature right now (it's only been approx. 3 yrs!), there's still room to move the instruction set. And middleware libraries can help (pytorch, for example, has more than just the CUDA backend, even if they're a bit less mature).
The real moat that nvidia has is their hardware capability, and CUDA is the disguised moat.
> The real moat that nvidia has is their hardware capability, and CUDA is the disguised moat.
There is an inmense amount of work behind the cuDNN libraries that outsiders keep ignoring.
These sort of high performance kernels are co-developed in very close collaboration with the hardware architects designing the chip. Speaking of the hardware in isolation of the high performance libraries reveals a deep misunderstanding of how the system was built. This is true of any mature vendor, not just Nvidia.
NVLink says hello. Then rack scale NVLink says hello...
Nobody can touch it. Then that's just the hardware. The software is so much better on Nvidia. The width and breadth of their offering is great and nobody is even close.
>Ultra Accelerator Link (UALink) is an open specification for a die-to-die interconnect and serial bus between AI accelerators. It is co-developed by Alibaba, AMD, Apple, Astera Labs,[1] AWS, Cisco, Google, Hewlett Packard Enterprise, Intel, Meta, Microsoft and Synopsys.[2]
it is true, but also not. nvidia is certainly producing a chip that nobody else can replicate (unless they're the likes of google, and even they are not interested in doing so).
The CUDA moat is the same type of moat as intel's x86 instruction set. Plenty of existing programs/software stack have been written against it, and the cost to migrate away is high. These LLM pipelines are similar, and even more costly to migrate away.
But because LLM is still immature right now (it's only been approx. 3 yrs!), there's still room to move the instruction set. And middleware libraries can help (pytorch, for example, has more than just the CUDA backend, even if they're a bit less mature).
The real moat that nvidia has is their hardware capability, and CUDA is the disguised moat.