Hacker News new | past | comments | ask | show | jobs | submit login

Just curious what your issues with Triton were. We've done OK with it using it to serve LLM models w/ a classifier head via HF Transformers pipeline & Flash Attention 2, as well as serving text generation models with the vLLM back-end.



triton is not that bad, TensorRT will give you nightmares


100% - probably why vLLM is now the default back-end in Dynamo.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: