Hacker News new | past | comments | ask | show | jobs | submit login

If I remember correctly there were similar scenarios that would occur using that popular Berkeley Pacman universe where he would run into a ghost to avoid the penalty of living for too long.



The example you're thinking of is actually in gridworld [1]. As you allud to, one of the parameters of the model is the cost of simply being alive for an additional time-step. If the cost is negative (a reward), then the agent will just sit there forever and accumulate infinite points. If it is zero, it might still just sit there to avoid falling into the hole, which has a large penalty and ends the simulation. As you turn up the dial on the cost of living, the agent starts using more and more aggressive strategies to reach the goal quickly. But if you make it too big, it will just jump in the hole.

[1] https://inst.eecs.berkeley.edu/~cs188/fa18/assets/slides/lec...


It reminds me of the thread about the Quake 3 bots, who left alone for several years, figured out that the best approach was to not kill each other.

https://i.imgur.com/dx7sVXj.jpg


Without knowledge of their reward function its difficult to tell if they're converged on this strategy or if its just broken.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: