I believe they reference Figure 8 from the GPT-4 technical report[0], which show... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		espadrine on May 9, 2023 \| parent \| context \| favorite \| on: Constitutional AI: RLHF on Steroids I believe they reference Figure 8 from the GPT-4 technical report[0], which shows that the pretrained model’s output probability for each answer (a, b, c, or d) is proportional to the probability of being correct, while after PPO (RLHF), it is quite a bit flatter. [0]: https://cdn.openai.com/papers/gpt-4.pdf

mike_hearn on May 9, 2023 [–]

I see, thanks. It's remarkable that the RLHF has such a drastically negative impact on the model's understanding of the world. Guess that explains the degrading unicorn problem. It makes me wonder how much better at coding an instruct-trained but non-aligned AI would get.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact