Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
dlewis1788
61 days ago
|
parent
|
context
|
favorite
| on:
Nvidia Dynamo: A Datacenter Scale Distributed Infe...
Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end.
bytesandbits
58 days ago
[–]
triton is not that bad, TensorRT will give you nightmares
dlewis1788
48 days ago
|
parent
[–]
100% - probably why vLLM is now the default back-end in Dynamo.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: