I'm all for running as much on the edge as possible, but we're not even close to being able to do real-time inference on Frontier models on Macs or iPads, and that's just for vanilla LLM chatbots. Low-precision Llama 3-8b is awesome, but it isn't a Claude 3 replacer, totally drains my battery, and is slow (M1 Max).
Multimodal agent setups are going to be data center/home-lab only for at least the next five years.
Apple isn't about to put 80GB on VRAM in an iPad for about 15 reasons.
Multimodal agent setups are going to be data center/home-lab only for at least the next five years.
Apple isn't about to put 80GB on VRAM in an iPad for about 15 reasons.