This is probably true, but unlike every Nvidia product we tried, it did, you kno... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

Carrok 59 days ago | parent | context | favorite | on: Nvidia Dynamo: A Datacenter Scale Distributed Infe...

This is probably true, but unlike every Nvidia product we tried, it did, you know, reply to inference requests with actual output. That said, you can serve vLLM with Ray Serve. https://docs.ray.io/en/latest/serve/tutorials/vllm-example.h...

ipsum2 59 days ago [–]

Ray doesn't offer anything if you use vLLM on top of Ray Serve though.

dzr0001 59 days ago | [–]

It does if you need pipeline parallelism across multiple nodes.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact