Will you make a fast implementation of the environment available? The best AIs r...

joooyzee · on Sept 24, 2021

Getting the platform to the point where people can spend most of the time on the actual training and experiments (and less on the infrastructure) is our current goal. We do have a forward model simulator which should let you step through the environment without re-implementing it, but if that's not what you're after, we'd love to chat more on what we could do to make this easier (feel free to ping any of us on Discord https://discord.gg/tRUMgdfC).

P.S. Sounds like a cool project! Have you heard of the Hearthstone AI competition (https://hearthstoneai.github.io/)? Might be of interest to you.

nicoco · on Sept 24, 2021

Oh, I have been thinking about learning about reinforcement learning by trying to make a STS AI too, nice! I eventually gave up, but would still be interested in seeing what can be done. Do you plan on releasing something at some point?

About re-implementing the environment, it is probably worth getting in touch with STS major modders and even streamers (jorbs comes to mind...). In case you did not do that already.

Buttons840 · on Sept 24, 2021

https://github.com/DevJac/solve_the_spire

I stretched the truth a bit, I'm actually doing something like "hierarchical model-free reinforcement learning", even so, figuring out how to break the game down to create a hierarchy of agents is a lot of work. Basically, the AI is composed of about 8 different traditional RL agents (neural networks), each deciding a different thing. One chooses which cards to draft, one chooses which actions to take in combat, one chooses which path to take on the map, etc.

Simple rules like "play random cards until your energy is used up" alone can sometimes beat the act 1 boss. My AI is barely above that, and still far from solving the game. I'm not convinced even DeepMind or other researchers could solve Slay the Spire right now.

It shows definite signs of improvement, but has only reached a point where it can beat the act 1 boss about 50% of the time. I think that is its limit right now. I'm doing policy gradient which is very sample inefficient. I'm going to implement soft-actor-critic and see if it can do better with better sample efficiency.

One thing I like about Slay the Spire is it's an environment to solve, not a competition. Gamers like to talk about PvP and PvE, well, I prefer AI vs environment over AI vs AI. In the end, an AI will win the competition, no surprise. An AI solving a new kind of environment is much more exciting IMHO.

chongli · on Sept 24, 2021

I feel like a traditional expert system would work a lot better in Slay the Spire at this stage. The choices you make in the game are all highly interrelated so I'm not sure they can be broken down into separate agents like that.

For example, when deciding what cards to play you often need to take into account what is coming up next on the map; it is not sufficient to consider only how to win the current fight. Relics such as incense burner carry over their turn counters between fights and so it's a strong strategy to delay the end of the current fight in order to set up an optimal incense burner number for the next fight. What number that counter should be is highly dependent on which enemies/elites/bosses you'll be facing in the next fight.

An expert system would have a database of every opponent in the game and when they are likely/guaranteed to appear and then seek to optimize the various conditions at the end of the current fight so that the next fight goes as smoothly as possible. I don't see how this could be accomplished with separate agents each attempting to play a different component of the game in isolation.

Buttons840 · on Sept 25, 2021

Your may be right, but that's a lot of boring work I didn't want to do. It was much more fun to hook up a neural net and watch it learn at least a few things. The combat agent does know where it is on the map, but it was only rewarded for minimizing damage taken in fights, so it would probably never learn to set up pen nib.

thegalah · on Sept 24, 2021

Matt from Coderone here: have made a forward model of the environment available by default through the game engine so there shouldn't need to "much" work. There are definitely friction points and some more abstractions can definitely be made, happy to iterate off any feedback provided

Madmallard · on Sept 24, 2021

Who is to say there isn’t a simple strategy that’s also optimal? It’s not exactly a complex game.

Buttons840 · on Sept 24, 2021

Maybe. All approaches can be tried, that's part of the fun. I'm just saying that the best algorithm we know of for solving games in general requires a model, and so if a model is made available to everyone it will save people some work.