Hacker Newsnew | past | comments | ask | show | jobs | submit | xixihaha's commentslogin

Very bold direction and I love it. Looks like a lot of CUDA expertise engineering. I am thinking why set batch size to 1? Hope to see comparison with real production with larger batch size. Also wondering how to extend it to other models, like MOE, expert parallel, CUDA kernel is not supported across GPUs?


Because people using it for interactive use use batch size 1


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: