If I remember correctly there were similar scenarios that would occur using that popular Berkeley Pacman universe where he would run into a ghost to avoid the penalty of living for too long.
The example you're thinking of is actually in gridworld [1]. As you allud to, one of the parameters of the model is the cost of simply being alive for an additional time-step. If the cost is negative (a reward), then the agent will just sit there forever and accumulate infinite points. If it is zero, it might still just sit there to avoid falling into the hole, which has a large penalty and ends the simulation. As you turn up the dial on the cost of living, the agent starts using more and more aggressive strategies to reach the goal quickly. But if you make it too big, it will just jump in the hole.