Monte Carlo Tree Search Variations

Monte Carlo Tree Search Magic: The Gathering AI
Authors
Joe Agajanian and Taylor Brent 5/10/13
Abstract
The methods available to create artificially intelligent players differs from game to game. With games such as Magic: The Gathering and GO, the available methods of searching a decision tree are very limited. The large branching factor as well as incomplete knowledge of the board state limits tree search. This poses a problem for both runtime and memory limits. To handle the large branching factor, the Monte Carlo Tree Search method can be used to reduce the branches visited while still making close to optimal decisions at each step of the game. Using this algorithm, we created a Magic: The Gathering AI and this paper explores the advantages and disadvantages of using Monte Carlo Tree Search (MCTS).
Overview of the Game
The rules and in-depth explanation of how to play Magic: The Gathering (MTG) can be found at http://www.wizards.com/Magic However, because the focus of this paper deals with researching MCTS algorithms, the game was simplified to make testing possible by reducing the time and complexity of each move. The rest of this paragraph is dedicated to briefly explaining how the game works along with simplifications we have made: The game is broken into alternating turns for each of the 2 players. Each player has a deck of 60 cards that can be (and usually is) completely different from the opponent's, and the number of cards to choose from is currently over 15,000. Each turn has a "Main Phase" where creatures in your hand can be played, and an "Attack Phase" where the current player can attack in order to do damage to the opponent, and finally a "Block Phase" where the opponent player can potentially block the current players creatures in order to avoid losing life. For specifics on each of these phases, the above link will go into more specific details. Lastly, each player has a life total starting at 20 and an initial deck configuration of 60 cards. The game is started by drawing seven cards and one card at each subsequent turn. If a player's life total reaches 0 (as a result of taking too many attacks) or if the number of cards left in a player's deck reaches 0 (from drawing more than your opponent) that player loses the game.
Implementation
For this project, we implemented several variations of MCTS based on ideas mentioned in A Survey of Monte Carlo Tree Search Methods The MCTS algorithm we are currently using finds all possible states that can result from a single move from the current board state. For each of these board states, a number of random simulations (20) are performed, and based on the number of projected wins (simulations run to end state), the first move associated with the highest projected likelihoood of success is chosen as the move of the AI player. One implementation that we explored was evaluating the different possible expanded board states using a heuristic before running simulations, and choosing a possible move before exploring all possible branches to reduce runtime. Or, as opposed to reducing the initial moves visited, we can cut off the random simulations performed at a certain depth level, and use a heuristic based on the board state to evaluate which move to take. A combination of these two algorithms decreases the runtime significantly, but also has a significant decrease in performance due to the decreased number of branches explored along with the decreased depth of each branch. Another Implementation that was implemented, but too slow to run to completion, was heuristically guided swarm tree search. This variation involved a coninuation beyond just the first level of BFS and after a certain point (constrained by memory) the random play simulation is executed as normal. The motivation for this method lies in the fact that due to the incomplete knowledge of the board state, it is likely that a player would be able to make short term predictions in Magic: The Gathering, but after a certain point, the AI is blind to moves that the opponent might make. While we believe that this implementation is equivalent or better than the algorithm we are using currently, the runtime and space required to see significant gains in a reasonable time was not significant. Lastly, we created a random player that we have the option of playing against (we can also play against another MCTS AI). Similar to other games, while the random player is not guarenteed to play efficiently, it still has the potential to play rather well and serves as a good base player to test our MCTS AI against.
Results
The first result we were able to determine, was that with decks that only consisted of creatures and lands (used to play creatures) one of the decks had a clear advantage. We pitted a typical 'green' deck (which tends to be slow in the early phases of the game but aims to stall until late) against a typical 'white' deck (which tends to go all-in during the early phases of the game) and found a clear winner. After multiple 1000 round simulations using two random players, the 'white' deck showed a very noticeable and consistent advantage. Based on this, we were able to test if our MCTS AI player would match these results. What we noticed, despite the limitations we had in testing (which will be explained later) was that MCTS relatively consistently wins when using the 'white' deck when the opponent is using the 'green' deck. While there are many scenarios in which the 'green' creature deck wins, this is to be expected due to the randomness of the game along with the limitations of doing a relatively small number of random simulations due to time constraints (20). Another scenario we investigated was the mirror match where identical moves on both sides effectively cancel each other out. This lead to both sides deciding it was not advantageous to ever attack or try to end the game. While we did not realize this before the start of this project, the AI revealed this result in testing. The outcome of each turn consisted of the players only playing creatures and no attacks were ever made. This was because the player who attacked first would most likely lose due to the nature of the game. Furthermore, when the second player - who always has one less card in his deck - realizes during the later turns that he will lose if no action is taken, he changes his approach and begins to attack. One of the unfortunate results of using MCTS for Magic: The Gathering is the space consumed and runtime required due to the branching factor. While MCTS attempts to deal with large branching factors by doing many random simulations, if you lack the computing power to do enough random simulation, MCTS fails to give an optimal solution. As the number of random simulations peformed by MCTS increases, the algorithm converges to minimax and tends to explore a larger number of branches. We found 20 random simulations to be the smallest number required to achieve a reasonable AI. Increasing the quality of our AI is limited by our computing resources due to the exponential number of states possible. This can be explained by the following hypothetical situation. Given that a current player has 7 playable creatures in hand of which at most 4 can be played, we have 7 choose 4 (35) different sets of creatures we can play. For each of these creature sets that we can play, we also have 5 creatures that are already in play giving us 9 creatures total. Lets say that 7 of these creatures can attack this turn (due to game mechanics restraints). Each of these 7 creatures can either choose to attack or not attack, giving us the powerset of 7 or 128 different ways we can choose to attack. Next, the opponent has 8 blockers in play. Lets say that each of these creatures have the ability to block any of the 7 creatures attacking. This means that for each of the 128 ways of attacking, (which average 3 creatures each), there are 8 choose 3 (56) ways that the creatures can be blocked, producing a branching factor of (128)(35)(56) = 250,880 for a single turn (combining playing creatures, attacking, and blocking). If we perform 100 random simulations with an average depth of 25 for each of these branches, we would have to look at 620,270,000 possiblities before deciding the move to make in a given turn. While this can be reduced by taking advantage of similar boardstates, if we stored all of these boardstates we would have memory problems as well. While 100 rounds is computationally feasable (if the user is willing to wait a significant portion of time), it is important to realize that even this number of random simulations covers only a small subspace of the possible scenarios. With 60 cards in a deck and 7 cards drawn at the start of the game, this leaves (46!)^2 possible draws that can occur over the course of a game. What this example is meant to show is that due to the limited number of random simulations that we are able to run, we are covering such an insignificant portion of the possible games (ignoring decisions entirely) that towards the beginning of the game our MCTS AI with its current limitations has the propensity to play only marginally better than a random player due to the vast number of possible outcomes that can result given any move made. While this may seem to be a problem given that an oracle would be able to play much better towards the beginning, as the decksizes decrease and the decision tree is exponentially decreased, our MCTS not only does better than random, but it matches the moves that we expect it to make based on expert knowledge of the game.
Future Work
As mentioned above, we found many advantages and disadvantages of our current MCTS MTG AI player. We saw that that our AI performed much better during the later portions of the game (as expected) than it did during the early turns due to the large branching factor. To solve this, a heuristically guided swarm search (given more computing resources) would improve the AI. Central to this improvement is tuning the heuristic to mimic the intuitions of a human player. The success of a heuristic can only be determined through testing, leaving room for future work and improvement. Beyond simply changing the AI that we are using, Magic: The Gathering is a complex game with over 15,000 cards, of which roughly 30% are non-creature cards that would require natural language processing in order to generally undestand. While we accounted for the basic rules necessary to play a novice game, there are many more rules that we overlooked that become relavant as we extend our cardpool. In addition, an AI so dependent on runtime should really be written in a compiled language much closer to the machine, like C++, where objects and their references can be handled explicitly. Although Python made it easy to code and test our AI, it was not the optimal choice of language.
The Code
Here is the code. The Code
References
A Survey of Monte Carlo Tree Search Methods Magic: The Gathering Rules Gleemin AI Programming in magic