Never overestimate the intelligence of the decision makers at big bureaucratic tech companies. Also, it is not in the best interest of any of them to be reliant on NVDA or any other single vendor for any critical workload whatsoever. Doubly not so for NVDA's mostly closed source and haphazardly optimized libraries.
All that said, Bill Daly rocks, and NVDA is a hardened target. But the DL frameworks have enormous performance holes once one stops running Resnet-152 and other popular benchmark graphs in the same way that 3DMark performance is not necessarily representative of actual gaming performance unless NVDA took it upon themselves to make it so.
And since DL is such a dynamic field (just like game engines), I expect this situation to persist for a very, very long time.
And Intel knew that mobile chips will one day become very popular, too - two decades ago. Much good did that knowledge do to the company.
It's not about them being ignorant about it. It's about them making decisions in spite of that knowledge - decisions that Make Sense™ for the advancement and increased profitability of the incumbent cash cow, but they are often contradictory or have a negative impact on new tech investment.
Here's one main reason why Nvidia will not go "full TPU" with its chips - it wants "scalability". That means it wants an architecture that can be "flexible and server different markets".
The companies that specialize in AI chips will likely beat them in performance because they only care about winning one market at a time (and the AI market is a pretty big one).
Intel's AI strategy is even more of a mess, because it has no clue what it can use to beat Nvdia, so its investments and developer ecosystems are all over the place.
Great point. Intel tried to break into the mobile and graphics market for sooo long with no success.
I do hope the AI specific TPUs that come out on the future will follow the ARM model instead of being silod into proprietary architectures. Fucking hate the vendor lock in of NVIDIA with CUDA.
There is the question of incentives though. The non-gpu companies want perfectly aligned performance that they can both built for and designate in advance. The gpu companies want to make said companies buy and rebuy as much as they can.
As the areas of expertise and manufacturing grow closer, the advantages of paying someone to do it for you decrease.
I know much too little to have an opinion on who is likely correct, but I can understand the two sides each having positions that don't assume the other side is an idiot.
Yes, they've been trying to do that in various forms since the 2009 Fermi GPU GTX 480 which crippled FP64. It never works. But it does create technical debt to work around their nonsense.
So unless they cripple CUDA altogether, there will always be efficient workarounds (arguably DirectX or OpenGL programmable shaders in the very worst-case scenario). They even gave up on doing so for GTX Titan Black and then resumed with Maxwell. Currently, I would not be surprised that the lack of a true consumer Volta GPU is their only play at crippling consumer Volta by making it effectively nonexistent or $3000 for the GTX Titan V.
What they could do across the board is hamfistedly disable the deep learning frameworks on GeForce. That would probably stop 90% of amateur hour data science on GeForce. But the remaining 10% would just recompile them without the cripple code in violation of some sort of scary EULA clause against doing so and requiring such cripple code in all HPC/AI applications. I would love to see them try this - they'll pry my FP32 MADs (which is the core operation of AI/ML as well as vertex and pixel shaders) from my cold dead consumer GPU desktop.
I don't think they'll do that though. They know the low-end is the entry point to their ecosystem. They just want to force people to graduate into the high-end after hooking them. Not that you have to: multiplication and addition want to be free.
1. Volta GPUs already have little matmul cores, basically a bunch of little TPUs.
2. The graphics dedicated silicon is an extremely tiny portion of the die, a trivial component (source: Bill Dally, nVidia chief scientist).
3. Memory access power and performance is the bottleneck (even in the TPU paper), and will only continue to get worse.