Hacker News new | past | comments | ask | show | jobs | submit login

I think the idea is they just feed each to the RLHF reward model used to train the model and return the most rewarded answer.





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: