Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

First, this is not an open source / weight release.

Second, it has the problem of non-stoping response.



What's the best technique to train the model to stop responding? A bit of fine tuning on texts with EOS markers?


I didn't see many papers on solving this problem.

I see non-stop response as a generalization problem because normally every training sample is not of infinite length.

Targeted supervised fine-tuning should work, as long as you have enough samples. However, supervised fine-tuning is not good for generalization.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: