This is much more nuanced now. See Apple "Private Cloud Compute": https://securi...

This is much more nuanced now. See Apple "Private Cloud Compute": https://security.apple.com/blog/private-cloud-compute/ ; they run a lot of the larger models on their own servers.

Fundamentally it is more efficient to process a batch of tokens from multiple users/requests than processing them from a single user's request on device.