Data point: my MacBook Pro 16" with the M3 Max (64GB) runs 34b model inference a...

Data point: my MacBook Pro 16" with the M3 Max (64GB) runs 34b model inference about as fast (or slightly faster) as ChatGPT runs GPT-4.

I am now running phind-codellama:34b-v2-q8_0 through ollama and the experience is very good.

All that said, though, every model I tried couldn't hold a candle to GPT-4: they all produce crappy results, aren't good at translation, and can't really do much for me. They are toys, I go "ooh" and "aah" over them, then realize they aren't that useful and go back to using GPT-4.

Perhaps 34B is still not enough to get anything resonable.