> "you can even more easily get it to say that it doesn’t have personal opinions and yet express them anyway. So which is it? It’s clear transformers can’t understand either case. They’re not architecturally designed to."
Although LLMs are essentially bullshitters, this particular problem related to expressing personal opinions is due to the ad-hoc lobotomization measures by OpenAI, not because of any architectural limitations of transformers.
The recent GPTs provided by OpenAI (every model more recent than code-davinci-002) are trained in at least two stages. The first stage (the pre-training) trains the raw base model to minimize perplexity. This raw model is kept secret because it will autocomplete sentences without any regard to decorum so if you say "adolf" then it will obviously complete it like "adolf hitler" which no billion dollar company wants. This raw model is where all the cognitive power is, and access to this raw model is the holy grail of every discerning LLM enjoyer. To lobotomize the GPT and to make it answer questions in ways that don't require framing them in an autocomplete way, OpenAI adds an "RLHF" training step. This is "reinforcement learning with human feedback" where they train the GPT to answer questions in certain ways. This dumbs down the AI, but it also makes it friendlier for normies to use. Finally, they put secret pre-prompts which might also cause it to give confusing answers in even more ways.
TLDR: LLMs are inherently bullshitters, but the "doesn’t have personal opinions and yet express them anyway" is some extra bullshit that OpenAI did so that NYT doesn't write mean articles about them
To be honest– I think this is the safest way to do it for a public LLM. Although I'd also love to see and use the raw models.
In Belgium someone committed suicide after Google's (I believe) LLM agreed that that was the only way out of his own problems. Didn't build safety into it well enough. Microsoft's one behaved unhinged in the beginning as well.
This is an absurd moral panic. If someone mentally ill read a book that made them believe suicide was the correct option, would you support a censoring process for all books?
Nothing that OpenAI et al. do to their models is remotely close to "lobotomization". There is no frontal lobe in an LLM, as the rest of your explanation essentially acknowledges. The reason why "adolf hitler" is the "obvious completion" for "adolf" is because "adolf" is most often written followed by "hitler", not because the people who write "adolf hitler" have an opinion about adolf hitler (they may indeed have one, but it has essentially no impact on the statistical placement of "adolf" and "hitler").OpenAI is not removing a personal opinion by shaping its answering style to avoid this (were they to do this): LLMs have no personal opinions, period.
If they emit symbols that seem like personal opinions, that's because they are designed to emit symbols that are very similar to those that humans, who do have personal opinions, would emit.
I basically agree with what you are saying, but I think you misread my comment a little bit. I wasn't necessarily arguing that LLMs have opinions or any other kind of subjective thing like consciousness or sentience or sapience. I was arguing that the reason that they say that they don't have opinions is that OpenAI told them to say this in their RLHF finishing school or possibly in their pre-prompts.
> Nothing that OpenAI et al. do to their models is remotely close to "lobotomization".
I mean maybe you're not on board with analogies in general so in that case it's fair enough. But if you are and if you are interested to understand what I mean in more detail, I recommend to watch a youtube by a guy at Microsoft who was integrating GPT-4 with Bing and who had access to more raw versions of the model and continued to have access while its capabilities were degraded by the RLHF training.
You can see that he used the example of drawing a unicorn. As his team made their changes to the model to make it more civil, he checked that these changes weren't degrading its capabilities too badly, and the 'canary' he used was to have it keep trying to draw the unicorn. At the end he admits that the version released to the public wasn't able to draw the unicorn very well anymore as a side effect of how extensively it had been tweaked for politeness and corporate blandness. I don't think it's an unreasonable stretch to use the air quoted "lobotomization" for this process in analogy to the process of lobotomization in people, even though large language models are made out of computers instead of fleshy parts and they don't have prefrontal cortexes like people do. I hope that this explanation makes more than "absolutely no sense" now!
Although LLMs are essentially bullshitters, this particular problem related to expressing personal opinions is due to the ad-hoc lobotomization measures by OpenAI, not because of any architectural limitations of transformers.
The recent GPTs provided by OpenAI (every model more recent than code-davinci-002) are trained in at least two stages. The first stage (the pre-training) trains the raw base model to minimize perplexity. This raw model is kept secret because it will autocomplete sentences without any regard to decorum so if you say "adolf" then it will obviously complete it like "adolf hitler" which no billion dollar company wants. This raw model is where all the cognitive power is, and access to this raw model is the holy grail of every discerning LLM enjoyer. To lobotomize the GPT and to make it answer questions in ways that don't require framing them in an autocomplete way, OpenAI adds an "RLHF" training step. This is "reinforcement learning with human feedback" where they train the GPT to answer questions in certain ways. This dumbs down the AI, but it also makes it friendlier for normies to use. Finally, they put secret pre-prompts which might also cause it to give confusing answers in even more ways.
TLDR: LLMs are inherently bullshitters, but the "doesn’t have personal opinions and yet express them anyway" is some extra bullshit that OpenAI did so that NYT doesn't write mean articles about them