Pytorch actually has surprisingly good support for Apple Silicon. Occasionally an operation needs to use CPU fallback but many applications are able to run inference entirely off of the CPU cores.
I’ve found it to be pretty terrible compared to CUDA, especially with Huggingface transformers. There’s no technical reason why it has to be terrible there though. Apple should fix that.
MLX will probably be even faster than that, if the model is already ported. Faster startup time too. That’s my main pet peeve though: there’s no technical reason why PyTorch couldn’t be just as good. It’s just underfunding and neglect