Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A key issue seems to me that they didn't do a gradual rollout of their new models and don't have reliable ways to measure model performance.

Worse, I would have believed they are running many different versions based on the expected use case of the users by now. I mean power users probably shouldn't be handled in the same way as casual users. Yet, everyone had the same bad system prompt.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: