In fact, I'm not sure how the "we will need tons of centralized inference infras...

gajjanag · 2025-01-29T20:12:05 1738181525

This is much more nuanced now. See Apple "Private Cloud Compute": https://security.apple.com/blog/private-cloud-compute/ ; they run a lot of the larger models on their own servers.

Fundamentally it is more efficient to process a batch of tokens from multiple users/requests than processing them from a single user's request on device.

talldayo · 2025-01-29T21:22:03 1738185723

Apple's strategy already failed. Their big bet on NPU hardware did not pay off at all, and right now it's effectively wasted silicon on every iDevice while the GPU does all the heavy inference work. Now they partner with OpenAI to handle their inference (and even that's not good enough in many cases[0]). The "centralized compute" lobby is being paid by Apple to do the work their devices cannot.

Until Apple or AMD unifies their GPU architectures and implements complex streaming multiprocessors, Nvidia will remain in a class of their own. Apple used to lead the charge on the foremost CUDA alternative too, but then they abandoned it to focus on proprietary standards instead. It's pretty easy to argue that Apple shot themselves in the foot with every opportunity they had to compete on good faith. And make no mistake: Apple could have competed with Nvidia if they weren't so stubborn about Linux support and putting smartphone GPUs in laptops and desktops.

[0] https://apnews.com/article/apple-ai-news-hallucinations-ipho...