Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"Full 131k" context , actually the full context is double that at 262144 context and with 8x yarn mutiplier it can go up to 2million. It looks like even full chip scale Cerebras has trouble with context length, well, this is a limitation of the transformer architechture itself where memory requirements scale ~linearly and compute requirements roughly quadratically with the increase in kv cache.

Anyway, YOU'RE NOT SERVING FULL CONTEXT CEREBRAS, YOU'RE SERVING HALF. Also what quantization exactly is this, can the customers know?



The model page says 32,768 natively with performance validated for up to 4x YaRN https://huggingface.co/Qwen/Qwen3-235B-A22B#processing-long-...

That would seem to align with the 131k number?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: