Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
paraschopra
11 months ago
|
parent
|
context
|
favorite
| on:
TokenFormer: Rethinking Transformer Scaling with T...
Why would it be higher? You can keep KV cache precomputed like before.
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: