Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
moultano
on May 18, 2023
|
parent
|
context
|
favorite
| on:
Ask HN: Can someone ELI5 transformers and the “Att...
differentiable relative to the model parameters. The attention mechanism does store(key(x), value(x)), followed by lookup(query(y)), where key(), value(), lookup(), query() are all designed to be differentiable.
Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: