xixihaha's comments

xixihaha · 2025-05-30T06:35:23 1748586923

Very bold direction and I love it. Looks like a lot of CUDA expertise engineering. I am thinking why set batch size to 1? Hope to see comparison with real production with larger batch size. Also wondering how to extend it to other models, like MOE, expert parallel, CUDA kernel is not supported across GPUs?

saagarjha · 2025-06-07T11:55:05 1749297305

Because people using it for interactive use use batch size 1