Hacker News new | past | comments | ask | show | jobs | submit login

This is much more nuanced now. See Apple "Private Cloud Compute": https://security.apple.com/blog/private-cloud-compute/ ; they run a lot of the larger models on their own servers.

Fundamentally it is more efficient to process a batch of tokens from multiple users/requests than processing them from a single user's request on device.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: