Typically requests are binned by context length so that they can be batched toge... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sakras 53 days ago \| parent \| context \| favorite \| on: Breaking Quadratic Barriers: A Non-Attention LLM f... Typically requests are binned by context length so that they can be batched together. So you might have a 10k bin and a 50k bin and a 500k bin, and then you drop context past 500k. So the costs are fixed per-bin.

daxfohl 52 days ago [–]

Makes sense, and each model has a max context length, so they could charge per token assuming full context by model if they wanted to assume worst case.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact