Try llama 3.3 70B. On groq or something. Runs on a 64GB macbook (4bit quantized,... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jasonjmcghee 9 months ago \| parent \| context \| favorite \| on: Elon Musk wanted an OpenAI for-profit Try llama 3.3 70B. On groq or something. Runs on a 64GB macbook (4bit quantized, which seems to not impact quality much). Things have come a long way. Compare to llama 2 70b. It's wild

Terretta 9 months ago [–]

Llama 3.3 70B 8-bit MLX runs on Macbook 128GB at 7+ tokens per second while running a full suite of other tools, even at the 130k tokens size, and behaves with surprising coherence. Reminded me of this time last year, first trying Mixtral 8x22 — which still offers a distinctive je ne sais quoi!

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact