My personal theory is that the newest models are not reliably more capable enoug...

Bjartr · 2025-04-28T12:40:24 1745844024

I wonder if this is similar to how that experiment[0] where they attempted to domesticate foxes led to traits like floppy ears.

Which is to say, even when attempting to objectively select for "well aligned" behavior, human tendency to favor non-material signals of "friendliness" still leaks in.

[0] https://en.m.wikipedia.org/wiki/Domesticated_silver_fox