Consider the problem of implementing a computer program to play a game. To simplify things a bit, we will only consider games with the following two properties:
- Two player - we do not deal with coalitions, etc.
- Zero sum - one player's win is the other's loss; there are no cooperative victories
Examples of these kinds of games include many classic board games, such as tic tac toe, chess, checkers, and go. For these types of games, we can model the game using what is called a game tree:
This is a section of a game tree for tic tac toe. Each node represents a board position, and the children of each node are the legal moves from that position. For each position, its score will be positive if favorable for player 1 (the more positive, the more favorable). Similarly, we will give each position which is favorable for player 2 a negative number (the more negative, the more favorable). In our tic tac toe example, player 1 is 'X', player 2 is 'O', and the only three scores we will have are +1 for a win by 'X', -1 for a win by 'O', and 0 for a draw. Note here that the blue scores are the only ones that can be computed by looking at the current position. To calculate the scores for the other positions, we must look ahead a few moves, perhaps by using one of the algorithms below.
Now that we have a way of representing the game in our program, how do we compute our optimal move? We will assume that the opponent is rational; that is, the opponent can compute moves just as well as we can, and the opponent will always choose the optimal move with the assumption that we, too, will play perfectly. (Contrast this, for example, to the beginning chess player, who will purposely make a move with a trap, in the hopes of catching the opponent into the trap and gaining a quick victory. However, if the opponent does not fall for the trap, our player finds that their position is now critically weakened).
One algorithm for computing the best move is the minimax algorithm:
If the game is over in the given position, then there is nothing to compute; minimax will simply return the score of the board. Otherwise, minimax will go through each possible child, and (by recursively calling itself) evaluate each possible move. Then, the best possible move will be chosen, where best is the move leading to the board with the most positive score for player 1, and the board with the most negative score for player 2.
How long does this algorithm take? For a simple game like tic tac toe, not too long - it is certainly possible to search all possible positions. For a game like Chess or Go however, the running time is prohibitively expensive. In fact, to completely search either of these games, we would first need to develop interstellar travel, as by the time we finish analyzing a move the sun will have gone nova and the earth will no longer exist. Therefore, all real computer games will search, not to the end of the game, but only a few moves ahead. Of course, now the program must determine whether a certain board position is 'good' or 'bad' for a certainly player. This is often done using an evaluation function. This function is the key to a strong computer game; after all, it does little good to be able to look ahead 20 moves, if, after we do, we decide that the position is good for us, when in fact, it is terrible!
There is an optimization we can make to the simple algorithm above that will allow us to search to much deeper depth in the same amount of time.
To understand the basic idea, consider the following: It is our turn to move, and we have just finished analyzing a possible move (move A), and it gives us a large advantage. Now, we are attempting to analyze the next possible move (move B), and we discover that, in the very first response we consider for our opponent, our opponent can force a neutral position! Then there is no reason for us to continue examining move B. At this point, we know that the best we can do if we play move B is to gain a neutral position, and we already have move A which can guarantee us a better position. The extreme version of this is if we find a way to force a win with our move - in which case, we have no need to search anything further. We should just play it immediately and win the game.
To formalize this idea, we will keep track of two numbers, alpha and beta for each node we're analyzing, hence the name for this algorithm alpha-beta pruning. Alpha will be the value of the best possible move you can make, that you have computed so far. Beta will be the value of the best possible move your opponent can make, that you have computed so far. If at any time, alpha >= beta, then your opponent's best move can force a worse position than your best move so far, and so there is no need to further evaluate this move. The same pruning condition applies if you are the min player, except instead of finding the move that yields alpha, you would find the move that yields beta.
To guarantee that this algorithm returns a move, we can start alpha with -infinity (the best you can do is lose) and beta with infinity (the worst your opponent can do is to let you win), and update these values as we examine more nodes.
Notice that it is to our advantage to generate the best moves first for max and the worst moves first for min; this will cause the algorithm to cut off more quickly. On average, using a good successor generator will allow alpha-beta to search to a level twice as deep as minimax in the same amount of time. Note also that alpha-beta returns the same score as minimax; it simply returns the same result in faster time.
You can step through the two search algorithms to observe the pruning below: