Hacker News new | past | comments | ask | show | jobs | submit login

This is probably true, but unlike every Nvidia product we tried, it did, you know, reply to inference requests with actual output. That said, you can serve vLLM with Ray Serve. https://docs.ray.io/en/latest/serve/tutorials/vllm-example.h...



Ray doesn't offer anything if you use vLLM on top of Ray Serve though.


It does if you need pipeline parallelism across multiple nodes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: