> "you can even more easily get it to say that it doesn’t have personal opinions...

MVissers · on April 28, 2023

To be honest– I think this is the safest way to do it for a public LLM. Although I'd also love to see and use the raw models.

In Belgium someone committed suicide after Google's (I believe) LLM agreed that that was the only way out of his own problems. Didn't build safety into it well enough. Microsoft's one behaved unhinged in the beginning as well.

This stuff can be very dangerous.

alex_sf · on April 28, 2023

This is an absurd moral panic. If someone mentally ill read a book that made them believe suicide was the correct option, would you support a censoring process for all books?

PaulDavisThe1st · on April 28, 2023

This makes absolutely no sense to me.

Nothing that OpenAI et al. do to their models is remotely close to "lobotomization". There is no frontal lobe in an LLM, as the rest of your explanation essentially acknowledges. The reason why "adolf hitler" is the "obvious completion" for "adolf" is because "adolf" is most often written followed by "hitler", not because the people who write "adolf hitler" have an opinion about adolf hitler (they may indeed have one, but it has essentially no impact on the statistical placement of "adolf" and "hitler").OpenAI is not removing a personal opinion by shaping its answering style to avoid this (were they to do this): LLMs have no personal opinions, period.

If they emit symbols that seem like personal opinions, that's because they are designed to emit symbols that are very similar to those that humans, who do have personal opinions, would emit.

ftxbro · on April 28, 2023

I basically agree with what you are saying, but I think you misread my comment a little bit. I wasn't necessarily arguing that LLMs have opinions or any other kind of subjective thing like consciousness or sentience or sapience. I was arguing that the reason that they say that they don't have opinions is that OpenAI told them to say this in their RLHF finishing school or possibly in their pre-prompts.

> Nothing that OpenAI et al. do to their models is remotely close to "lobotomization".

I mean maybe you're not on board with analogies in general so in that case it's fair enough. But if you are and if you are interested to understand what I mean in more detail, I recommend to watch a youtube by a guy at Microsoft who was integrating GPT-4 with Bing and who had access to more raw versions of the model and continued to have access while its capabilities were degraded by the RLHF training.

https://www.youtube.com/watch?v=qbIk7-JPB2c

You can see that he used the example of drawing a unicorn. As his team made their changes to the model to make it more civil, he checked that these changes weren't degrading its capabilities too badly, and the 'canary' he used was to have it keep trying to draw the unicorn. At the end he admits that the version released to the public wasn't able to draw the unicorn very well anymore as a side effect of how extensively it had been tweaked for politeness and corporate blandness. I don't think it's an unreasonable stretch to use the air quoted "lobotomization" for this process in analogy to the process of lobotomization in people, even though large language models are made out of computers instead of fleshy parts and they don't have prefrontal cortexes like people do. I hope that this explanation makes more than "absolutely no sense" now!

PaulDavisThe1st · on April 28, 2023

Yes, that makes much more sense to me now. I'm still not sure it's the analogy I'd use, but I see clearly what you're getting at.

ChatGTP · on April 28, 2023

But aren’t humans lobotomized too?