Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Carrok
59 days ago
|
parent
|
context
|
favorite
| on:
Nvidia Dynamo: A Datacenter Scale Distributed Infe...
This is probably true, but unlike every Nvidia product we tried, it did, you know, reply to inference requests with actual output. That said, you can serve vLLM with Ray Serve.
https://docs.ray.io/en/latest/serve/tutorials/vllm-example.h...
ipsum2
59 days ago
[–]
Ray doesn't offer anything if you use vLLM on top of Ray Serve though.
dzr0001
59 days ago
|
parent
[–]
It does if you need pipeline parallelism across multiple nodes.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: