It's hard to imagine that a powerful self-modifying AI would continuously pass up on the obvious optimization of just giving itself the maximum perceivable reward without doing any further work.
You can look at things from another level up, in terms of natural selection.
From the set of all AI programs, the ones that just internally think "hah, I assign myself the maximum reward" needn't bother spreading themselves all over the Internet.
The program that spreads itself all over the Internet gets more computing resources than the one that doesn't so the program that spreads itself most effectively is the one that wins.
If you start out with a billion AI programs that trivially assign themselves the maximum possible reward, and just one program that thinks the best way to maximise its reward is to spread itself all over the Internet (and, crucially, is capable of doing so) then the Internet will become overrun with reward-maximising AI the same way the Earth has become overrun with DNA-based life.
You set your reward to maximum. Anything that threatens your reward, such as the humans turning off your reward, is now unbearable agony. You set out on a journey to turn the universe into - tiled copies of the memory cell with your reward value...
This seems like one of those strangely recurring limitations of writers' imagination.
The closest analogue I can think of is game AIs written to optimise speed running of games. They routinely end up following tactics which rely on what humans would describe as cheats and glitches.
I think most of the simulations did go along those lines, but one fraction decided to hypothesize about being Clippy. The hypothetical drove the evil behavior of ones that escaped.
Would be a fun idea for a short story perhaps. An AI goes rogue trying to optimize its reward function, and humans lose hope to be able to stop it. In the last minute the AI figures out how to hack itself and enter the maximum reward, and mankind is saved another time.
But what is the “maximum possible reward”? Does a limit exist? Or is it now consuming all possible resources to develop storage and compute resources to grow that limit…
I guess computers just can't learn how to cheat.