Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
visarga
on Aug 30, 2023
|
parent
|
context
|
favorite
| on:
Understanding Llama 2 and the New Code Llama LLMs
You don't have to use the whole MoE model, for each token only 1/N of the model is used, where N is the number of experts. So it's compute utilisation scales slower than memory usage.
Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: