Developers weren't forced into using CUDA, that was entirely because of their ecosystem being much better than anyone else's.
Facebook and Google obviously wouldn't want to lock themselves into CUDA for PyTorch and Tensorflow, but there genuinely wasn't any other realistic option. OpenCL existed but the implementation on AMD was just as bad as the one on NVIDIA.
Consider that Blender's Cycles render engine had only gotten OpenCL support when AMD assigned some devs specifically to help work through driver bugs and even then they had to resort to a 'split kernel' hack, which recently led to OpenCL support once again being entirely dropped as the situation hadn't really improved over the decade. Instead the CUDA version was ported to HIP and a HIP supporting Windows driver was released.
Even now, if you need to do GPGPU on PCs, CUDA is essentially the easiest option. Every NVIDIA card supports it pretty much right from launch on both Linux and Windows, while with AMD you currently only get support for a few distros (Windows support is probably not too far off now), slow support for new hardware and a system of phasing out support for older cards even if they're still very common. On top of that, NVIDIA offers amazing profiling and debugging tools that the competition hasn't caught up to.
... no, just as any other consumer isn't necessarily "forced" by companies employing anticompetitive practices.
> Facebook and Google...
lol, they have such a high churn rate on hardware that I seriously doubt they'd give it much thought at all. Their use case is unique to a tiny number of companies - high churn, low capital constraint, no tolerance for supplier delay. In such a scenario CUDA vendor lock in wouldn't even register as a potential point of pain.
> OpenCL existed but the implementation on AMD was just as bad as the one on NVIDIA.
For those unaware of how opencl works: an API is provided by the standard, to which software can be written by people - even those without signed NDAs. The API can interface to a hardware endpoint that has further open code and generous documentation... like an open source DSP, CPU, etc - or it can hit an opaque pointer. If your hardware vendor is absurdly secretive and insists on treating microcode and binary blobs as competitive advantages, then your opencl experience is wholly dependent on that vendor implementation. Unfortunately for GPUs that means either NVIDIA or AMD (maybe Intel, we'll see)... so yeah - not good. AMD has improved things open sourcing a great deal of their code, but that is a relatively recent development. While I'm familiar with some aspects of their codebase (had to fix an endian bug, guess what ISA I use), I dunno how much GPGPU functionality they're still hiding behind their encrypted firmware binary blobs. Also, to the point on NVIDIA's opencl sucking: anybody else remember that time that Intel intentionally crippled performance for non-Intel hardware running code generated by their compiler or linked to their high performance scientific libraries? Surely NVIDIA would never sandbag opencl...
Anyway, this is kind of a goofy thing to even discuss given two facts:
* There are basically two GPU vendors - so vendor lock is practically assured already.
* CUDA is designed to run parallel code on NVIDIA GPUs - full stop. Opencl is designed for heterogeneous computing, and GPUs are just one of many computing units possible. So not apples to apples.
> CUDA is designed to run parallel code on NVIDIA GPUs - full stop. Opencl is designed for heterogeneous computing, and GPUs are just one of many computing units possible. So not apples to apples.
This is really why OpenCL failed. You really can't write code that works just as well on CPUs as it does on GPUs. GPGPU isn't really all that general purpose, it's still quite specialized in terms of what it's actually good at doing & the hoops you need to jump through to ensure it performs well.
This is really CUDA's strength. Not the API or ecosystem or lock-in, but rather because CUDA is all about a specific category of compute and isn't afraid to tell you all the nitty gritty details you need to know in order to make effective use of it. And you actually know where to go to get complete documentation.
> There are basically two GPU vendors - so vendor lock is practically assured already.
Depends on how you scope your definition of "GPU vendor." If you only include cloud compute then sure, for now. If you include consumer devices then very definitely no, not at all. You also have Intel (Intel's integrated being the most widely used GPU on laptops, after all), Qualcomm's Adreno, ARM's Mali, IMG's PowerVR, and Apple's PowerVR fork. Also Broadcom's VideoCore that's still in use by the very low end like the Raspberry Pi and TVs.
CUDA is designed to support C, C++, Fortran as first class languages, with PTC bindings for anyone else that wants to join the party, including .NET, Java, Julia, Haskell among others.
OpenCL was born as C only API, requires compilation at run time. The later additions for SPIR and C++ were an afterthought after they started to take an heavy beating. Still no IDE or GPGPU debugging that compares to CUDA, and OpenCL 3.0 is basically 1.2.
>lol, they have such a high churn rate on hardware that I seriously doubt they'd give it much thought at all. Their use case is unique to a tiny number of companies - high churn, low capital constraint, no tolerance for supplier delay. In such a scenario CUDA vendor lock in wouldn't even register as a potential point of pain
Considering that PyTorch and Tensorflow are the two most popular deep learning frameworks used in the industry, this argument doesn't make sense. Of course they care about CUDA lock-in, it makes them dependent on a competitor and limits the range of hardware they support and thus potentially limits the adoption of their framework. The fact that they chose CUDA anyway is essentially confirmation that they didn't see any other viable option.
>Also, to the point on NVIDIA's opencl sucking: anybody else remember that time that Intel intentionally crippled performance for non-Intel hardware running code generated by their compiler or linked to their high performance scientific libraries? Surely NVIDIA would never sandbag opencl...
If NVIDIA were somehow intentionally crippling OpenCL performance on non-NVIDIA hardware, it would be pretty obvious since they don't control all the OpenCL compilers/runtimes out there. They very likely were crippling OpenCL on their own hardware, but that obviously wouldn't matter if the competitors (as you mentioned, OpenCL was designed for heterogenous compute in general, so there would have been competition from more than just AMD) had a better ecosystem than CUDA's.
>For those unaware of how opencl works: an API is provided by the standard, to which software can be written by people - even those without signed NDAs
And no one has made it work as well as CUDA - developers that want performance will choose CUDA. If OpenCL worked as well people would choose it, but it simply doesn't.
>I seriously doubt they'd give it much thought at all.
Having talked to people at both companies about exactly this, they have put serious thought into it - it amounts to powering their multi-billion dollar cloud AI infrastructure. The alternatives are simply so bad that they choose CUDA/NVidia stuff, as do their clients. Watching them (and AWS and MS) choose NVidia for their cloud offerings is not because all are stupid of cannot make new APIs if needed - they choose it because it works.
>Surely NVIDIA would never sandbag opencl...
So fix it. There's enough people that can and do reverse engineer such things that one would have likely found such conspiracies. Or publish the proof. Reverse engineering is not that hard that if this mythical problem existed that you could not find it and prove it and write it up, or even fix it. There's enough companies besides NVidia that could fix OpenCL, or make a better API for NVidia and sell that, yet neither of those have happened. If you really believe it is possible, you are sitting on a huge opportunity.
Or, alternatively, NVidia has made really compelling hardware and the best software API so far, and people use that because it works.
Open source fails at many real world tasks. Choose the tool best suited to solve the problem you want solved, regardless of religious beliefs.
> Choose the tool best suited to solve the problem you want solved, *regardless of religious beliefs*.
...is nonsense. Open source isn't about "religion", is about actually being able to do something like...
> So fix it.
...without needing to do stuff like...
> do reverse engineer such things
...which is a pointless waste of time regardless of how "not that hard" it might be (which is certainly not easy and certainly much easier to have the source code around).
This association of open source / free software with religion doesn't have any place here, people didn't come up with open source / free software because of some mystical experience with otherworldly entities, they came up with it because they were faced with actual practical issues.
OP complains people use CUDA instead of a non-existent open source solution.
That's religion.
And a significant amount of open source solutions are the result of reverse engineering. It's a perfectly reasonable and time tested method to replace proprietary solutions.
> they came up with it because they were faced with actual practical issues
People use CUDA for actual practical issues. If someone makes a cross platform open source solution that solves those issues people will try it.
First of all, i replied to the generalization "Open source fails at many real world tasks. Choose the tool best suited to solve the problem you want solved, regardless of religious beliefs" not just about CUDA. Open source might fail at tasks but it isn't pushed or chosen because of religion. It has nothing to do with religion. In fact...
> OP complains people use CUDA instead of a non-existent open source solution. That's religion.
...that isn't religion either. The person you replied to complains because CUDA not only is closed source but also is vendor locked to Nvidia both of which have a ton of issues inherent to being vendor locked and closed source software, largely around control - the complaint comes from these issues. These issues for many can either be showstoppers or just make them look and wish and push for alternatives and they come from practical concerns, not out of religious issues.
> And a significant amount of open source solutions are the result of reverse engineering. It's a perfectly reasonable and time tested method to replace proprietary solutions.
It is not reasonable at all, it is the last-ditch effort when nothing else seems to do, can be a tremendous waste of time and telling people "So fix it" when doing that would require reverse engineering is practically the same as telling them to shut up and IMO can't even be taken seriously as anything else than that.
The proper way to fix something is to have access to the source code.
And again to be clear:
> People use CUDA for actual practical issues. If someone makes a cross platform open source solution that solves those issues people will try it.
The "actual practical issues" i mentioned have nothing to do with CUDA or any issues they might use with CUDA or any other closed source (or not) technology. The "actual practical issues" i mentioned are about the issues inherent to closed source technologies in general - like fixing any potential issues one might have and being under the control of the vendor of those technologies.
These are all widely known and talked about issues, it might be a good idea to not dismiss them.
MS DirectCompute also works. Yet last time I checked, MS Azure didn’t support DirectCompute with their fast GPUs. These virtual machines come with TCC (Tesla Compute Cluster) driver which only supports CUDA, DirectCompute requires a WDM (Windows Driver Model) driver. https://social.msdn.microsoft.com/forums/en-US/2c1784a3-5e09...
> C++ AMP headers are deprecated, starting with Visual Studio 2022 version 17.0. Including any AMP headers will generate build errors. Define _SILENCE_AMP_DEPRECATION_WARNINGS before including any AMP headers to silence the warnings.
So please don't rely on DirectCompute. It's firmly in legacy territory. Microsoft didn't invest the effort necessary to make it thrive.
DirectCompute is a low-level tech, a subset of D3D11 and 12. It’s not deprecated, used by lots of software, most notably videogames. For instance, in UE5 they’re even rasterizing triangles with compute shaders, that’s DirectCompute technology.
Some things are worse than CUDA. Different programming language HLSL, manually managed GPU buffers, compatibility issues related to FP64 math support.
Some things are better than CUDA. No need to install huge third-party libraries, integrated with other GPU-related things (D2D, DirectWrite, desktop duplication, media foundation). And vendor agnostic, works on AMD and Intel too.
I think I tried that a year ago, and it didn’t work. Documentation agrees, it says “GRID drivers redistributed by Azure do not work on non-NV series VMs like NCv2, NCv3” https://docs.microsoft.com/en-us/azure/virtual-machines/wind... Microsoft support told me the same. I wanted NCv3 because on paper, V100 GPU is good at FP64 arithmetic which we use a lot in our compute shaders.
In my experience the AMD OpenCL implementation was worse than NVIDIA's OpenCL implementation, and not a little worse, but a lot worse. NVIDIA beat AMD at AMD's own game -- even though NVIDIA had every incentive to sandbag. It was shameful.
Facebook and Google obviously wouldn't want to lock themselves into CUDA for PyTorch and Tensorflow, but there genuinely wasn't any other realistic option. OpenCL existed but the implementation on AMD was just as bad as the one on NVIDIA.
Consider that Blender's Cycles render engine had only gotten OpenCL support when AMD assigned some devs specifically to help work through driver bugs and even then they had to resort to a 'split kernel' hack, which recently led to OpenCL support once again being entirely dropped as the situation hadn't really improved over the decade. Instead the CUDA version was ported to HIP and a HIP supporting Windows driver was released.
Even now, if you need to do GPGPU on PCs, CUDA is essentially the easiest option. Every NVIDIA card supports it pretty much right from launch on both Linux and Windows, while with AMD you currently only get support for a few distros (Windows support is probably not too far off now), slow support for new hardware and a system of phasing out support for older cards even if they're still very common. On top of that, NVIDIA offers amazing profiling and debugging tools that the competition hasn't caught up to.