Most LLMs are deterministic, but the tooling around them samples randomly from the output to let users explore the nearby space of responses without having to come up with infinitely nuanced prompts. You can turn this off.
However, the structure of OpenAI's GPT-4 is not deterministic. The most likely explanation I've seen is that they only activate some parts of the model for each input, and the parts are load-balanced so sometimes a different part of the model will be responding. https://news.ycombinator.com/item?id=37006224
This non-deterministic sampling is not only for users to explore the space of responses. Without this, the LLM itself is prone to generate too-repetitive text.
> they only activate some parts of the model for each input
Perhaps you see seemingly random results because OpenAI is A/B testing multiple versions, or different combinations of hyperparameters, so that you can train GPT5.
Nah; the mentioned paper above (from a few days ago here on HN) show about how GPT4 is nondeterministic because the sparse mixture of experts technique used is nondeterministic based on batch positioning.
That article went past my level of expertise, which suggests that "easily" is, as you imply, a matter of perspective. It's possible the current behavior is a result of tradeoffs made for performance or cost. Modifications to make the model deterministic could depend on making unacceptable tradeoffs.
However, the structure of OpenAI's GPT-4 is not deterministic. The most likely explanation I've seen is that they only activate some parts of the model for each input, and the parts are load-balanced so sometimes a different part of the model will be responding. https://news.ycombinator.com/item?id=37006224