A key issue seems to me that they didn't do a gradual rollout of their new models and don't have reliable ways to measure model performance.
Worse, I would have believed they are running many different versions based on the expected use case of the users by now. I mean power users probably shouldn't be handled in the same way as casual users. Yet, everyone had the same bad system prompt.
Worse, I would have believed they are running many different versions based on the expected use case of the users by now. I mean power users probably shouldn't be handled in the same way as casual users. Yet, everyone had the same bad system prompt.