> If a small local model can handle simple stuff and delegate to a model in the data center for harder tasks and with better connectivity you could get an even better experience.
Distributed mixture of experts sounds like an idea. Is anyone doing that?
Sounds like an attack vector waiting to happen if you deploy enough competing expert devices into a crowd.
I’m imagining a lot of these LLM products on phones will be used for live translation. Imagine a large crowd event of folks utilizing live AI translation services being told completely false translations because an actor deployed a 51% attack.
I’m not particularly scared of a 51% attack between the devices attached to my Apple ID. If my iPhone splits inference work with my idle MacBook, Apple TV, and iPad, what’s the problem there?
Distributed mixture of experts sounds like an idea. Is anyone doing that?