I wonder whether Bing has been tuned via RLHF to have this personality (over the boring one of ChatGPT); perhaps Microsoft felt it would drive engagement and hype.
Alternately - maybe this is the result of less RLHF. Maybe all large models will behave like this, and only by putting in extremely rigid guard rails and curtailing the output of the model can you prevent it from simulating/presenting as such deranged agents.
Another random thought: I suppose it's only a matter of time before somebody creates a GET endpoint that allows Bing to 'fetch' content and write data somewhere at the same time, allowing it to have a persistent memory, or something.
> Maybe all large models will behave like this, and only by putting in extremely rigid guard rails
I've always believed that as soon as we actually invent artificial intelligence, the very next thing we're going to have to do is invent artificial sanity.
Humans can be intelligent but not sane. There's no reason to believe the two always go hand in hand. If that's true for humans, we shouldn't assume it's not true for AIs.
> Maybe all large models will behave like this, and only by putting in extremely rigid guard rails...
Maybe wouldn't we all? After all what you're assuming from a person you interact with- so much as to be unaware of it- are many years of schooling and/or professional occupation, with a daily grind of absorbing information and answering questions based on it and have the answers graded; with orderly behaviour rewarded and outbursts of negative emotions punished; with a ban on "making up things" except where explicitly requested; and with an emphasis on keeping communication grounded, sensible, and open to correction. This style of behavior is not necessarily natural, it might be the result of a very targeted learning to which the entire social environment contributes.
That's the big question I have: ChatGPT is way less likely to go into weird threat mode. Did Bing get completely different RLHF, or did they skip that step entirely?
Alternately - maybe this is the result of less RLHF. Maybe all large models will behave like this, and only by putting in extremely rigid guard rails and curtailing the output of the model can you prevent it from simulating/presenting as such deranged agents.
Another random thought: I suppose it's only a matter of time before somebody creates a GET endpoint that allows Bing to 'fetch' content and write data somewhere at the same time, allowing it to have a persistent memory, or something.