Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

8K is a sizable window, sure larger 'exists' but also advertised context windows and functional context windows are not the same thing. I would rather a model that can 'only' handle 8k tokens but handles 8k as well as it handles 1k compared to a model that 'can' handle 32k, but realistically, output for contexts beyond 1k are garbage.


Deepseek Coder v2 and Qwen2 are both great at 32k context. Can’t tell the difference between those models at 8k and 32k fully utilised. The difference in quality between them and 8k models when doing codegen is night and day. Not to mention that many of the little 8k models also have sliding window at 4k which essentially makes them 4k models.


I agree, they're exceptional models, however this can not be said of all models that boast a large context window.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: