They have, because until now Apple Silicon was the only practical way for many to work with larger models at home because they can be configured with 64-192GB of unified memory. Even the laptops can be configured with up to 128GB of unified memory.
Performance is not amazing (roughly 4060 level, I think?) but in many ways it was the only game in town unless you were willing and able to build a multi-3090/4090 rig.
I'm currently wondering how likely it is I'll get into deeper LLM usage, and therefore how much Apple Silicon I need (because I'm addicted to macOS). So I'm some way closer to your steel man than you'd expect. But I'm probably a niche within a niche.
Doubt it, a year ago useful local LLMs on a Mac (via something like ollama) was barely taking off.
If what you say it's true you were among the first 100 people on the planet who were doing this; which btw, further supports my argument on how extremely rare is that use case for Mac users.
People were running llama.cpp on Mac laptops in March 2023 and Llama2 was released in July 2023. People were buying Macs to run LLMs months before M3 machines became available in November 2023.
No one goes to an Apple store thinking "I'll get a laptop to do AI inference".