Try llama 3.3 70B. On groq or something. Runs on a 64GB macbook (4bit quantized, which seems to not impact quality much). Things have come a long way. Compare to llama 2 70b. It's wild
Llama 3.3 70B 8-bit MLX runs on Macbook 128GB at 7+ tokens per second while running a full suite of other tools, even at the 130k tokens size, and behaves with surprising coherence. Reminded me of this time last year, first trying Mixtral 8x22 — which still offers a distinctive je ne sais quoi!