4o will get the answer right on the first go if you ask it "Search the Internet ...

paulcole · on July 24, 2024

I didn't even need to do that. 4o got it right straight away with just:

"how many r's are in strawberry?"

The funny thing is, I replied, "Are you sure?" and got back, "I apologize for the mistake. There are actually two 'r's in the word strawberry."

ofrzeta · on July 24, 2024

I kind of tried to replicate your experiment (in German where "Erdbeere" has 4 E) that went the same way. The interesting thing was that after I pointed out the error I couldn't get it to doubt the result again. It stuck to the correct answer that seemed kind of "reinforced".

It was also interesting to observe how GPT (4o) even tried to prove/illustrate the result typographically by placing the same word four times and putting the respective letter in bold font (without being prompted to do that).

jcheng · on July 24, 2024

GPT-4o-mini consistently gives me this:

> How many times does the letter “r” appear in the word “strawberry”?

> The letter "r" appears 2 times in the word "strawberry."

But also:

> How many occurrences of the letter “r” appear in the word “strawberry”?

> The word "strawberry" contains three occurrences of the letter "r."

brandall10 · on July 24, 2024

Neither phrase is causing the LLM to evaluate the word itself, it just helps focus toward parts of the training data.

Using more 'erudite' speech is a good technique to help focus an LLM on training data from folks with a higher education level.

Using simpler speech opens up the floodgates more toward the general populous.

brandall10 · on July 24, 2024

All that's happening is it finds 3 most commonly in the training set. When you push it, it responds with the next most common answer.

paulcole · on July 24, 2024

But then why does it stick to its guns on other questions but not this one?

brandall10 · on July 25, 2024

I haven't played with this model, but rarely do I find working w/ Claude or GPT-4 for that to be the case. If you say it's incorrect, it will give you another answer instead of insisting on correctness.

paulcole · on July 25, 2024

Wait what? You haven’t used 4o and you confidently described how it works?

brandall10 · on July 25, 2024

It's how LLMs work in general.

If you find a case where forceful pushback is sticky, it's either because the primary answer is overwhelmingly present in the training set compared to the next best option or because there are conversations in the training that followed similar stickiness, esp. if the structure of the pushback itself is similar to what is found in those conversations.

paulcole · on July 25, 2024

Right... except you said:

> If you say it's incorrect, it will give you another answer instead of insisting on correctness.

> When you push it, it responds with the next most common answer.

Which clearly isn't as black and white as you made it seem.

brandall10 · on July 25, 2024

I'll put it another way - behavior like this is extremely rare in my experience. I'm just trying to explain if one encounters it why it's likely happening.