Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The current state of the art is RLHF (reinforcement learning with human feedback); initially trained to complete human utterances, plus fine-tuning to maximize human feedback on whether the completion was "helpful" etc.

https://huggingface.co/blog/rlhf



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: