Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From an infrastructure perspective, If you have access to the hardware, a fun starting point is running NCCL tests across the infrastructure. Start with a single GPU, then 8 GPUs on a host, then 24 GPU multi hosts over IB or RoCE. You will get a feel for MPI and plenty of knobs to turn on the Kubernetes side.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: