To chime in on one point here: I think you're wrong about what an LLM is. You're technically correct about how an LLM is designed and built, but I don't think your conclusions are correct or supported by most research and researchers.
In terms of the Jedi IQ Bell curve meme:
Left: "LLMs think like people a lot of the time"
Middle: "LLMs are tensor operations that predict the next token, and therefore do not think like people."
Right: "LLMs think like people a lot of the time"
There's a good body of research that indicates we see emergent abilities, theory of mind, and a bunch of other stuff that shows models do deep levels of summarization, pattern matching during training from these models as they scale up.
Notice in your own example there's an assumption models summarize "male-coded" vs "female-coded" names; I'm sure they do. Interpretability research seems to indicate they also summarize extremely exotic and interesting concepts like "occasional bad actor when triggered," for instance. Upshot - I propose they're close enough here to anthropomorphize usefully in some instances.
In terms of the Jedi IQ Bell curve meme:
Left: "LLMs think like people a lot of the time"
Middle: "LLMs are tensor operations that predict the next token, and therefore do not think like people."
Right: "LLMs think like people a lot of the time"
There's a good body of research that indicates we see emergent abilities, theory of mind, and a bunch of other stuff that shows models do deep levels of summarization, pattern matching during training from these models as they scale up.
Notice in your own example there's an assumption models summarize "male-coded" vs "female-coded" names; I'm sure they do. Interpretability research seems to indicate they also summarize extremely exotic and interesting concepts like "occasional bad actor when triggered," for instance. Upshot - I propose they're close enough here to anthropomorphize usefully in some instances.