Your program should prompt the user for an initial board size in rows and columns, and an integer n, which will be either three or four. (Connect Two is not a very interesting game, and Connect Five or larger would have too large a state space.) Your program will then use the minimax algorithm to traverse the entire game tree to determine if the first player (MAX) has a guaranteed win, the second player (MIN) has a guaranteed win, or neither player is guaranteed to win (either player can force a draw).
The minimax value for a terminal state (a state where MAX has won, MIN has won, or the game is a draw) should be the calculated as follows. If MAX has won, the value of the state is int(10000 * rows * cols / moves), where moves is the number of moves the game has lasted. For instance, if you are playing on a 4-by-5 grid and MAX wins in 12 moves, then the value of this state is 10000 * 4 * 5 / 12 = 16666. Dividing by the length of the game will prioritize quicker-winning moves. If MIN has won, use the same formula, but negated. If there is a draw, the value of the state is zero.
After the user types in values for rows, columns, and n, print out the result of the minimax search (if there is a guaranteed win for either player, or if perfect play on both sides results in a draw), and the size of the transposition table, and then let the user play against the computer. The user should be able to choose if they want to move first, or if they want to the computer to move first. The computer will used the saved minimax actions to play.
If your minimax algorithm determined that there is a guaranteed win for one of the players, and the computer acts as that player, then the computer should never lose.
Note: You should use a transposition table to cache minimax values and the best moves for game states
which occur multiple times in the game tree. If you don't,
your minimax search will not be able to search very large state spaces.
Note 2: Do not re-run minimax after every move. Just run minimax at the start of
the game and save all the best-move actions.
Hint: For a transposition table, I suggest using a map/dictionary/hashtable
that maps game states (board configurations) to pairs of integers, one which represents
the minimax value for that game state, and the other that represents the best move
(column in which to drop the token) from that game state. Using this, whenever minimax
encounters a state it has seen before, it can use the cached value in the transposition
table, rather than recalculating everything.
You will notice that Part A will not be able to search the full state space for the "traditionally-sized" game of Connect Four (6 rows, 7 columns) in any reasonable amount of time --- there are just too many board configurations to consider. It turns out even alpha-beta pruning will not make this search feasible. Instead, what we will do is similar to what humans do when faced with a game that can continue for a large number of moves: we will only look ahead a fixed number of moves. To do this, we will require a heuristic function that can estimate the quality of an unfinished game state. (See section 5.4 or your notes from class). Your program should prompt the user for an initial board size in rows and columns, and an integer n, which will be either three or four. Your program should also ask the user for the number of moves to search ahead, which we will call d (for depth). d counts moves by both players, so if d=1, the computer will only be able to examine the game states resulting from the its own next move. If d=2, the computer will be able to examine game states resulting from its own next move and the user's response move.
Unlike part A, where we used minimax to "pre-process" the game tree to get the best moves, we will use minimax with alpha-beta pruning after each move of the game. The reason for this is since we are using a cut-off depth, if we run a search after each move, we will always be able to look d moves ahead.
Your program should immediately launch a game with the computer moving first. In other words, the computer will run alpha-beta with heuristics, looking ahead d moves, and make what it determines to be the best move. Then the user will get a chance to move, and the computer will run another search, looking d moves ahead again, except because the game has progressed, those d moves get you two levels deeper in the game tree.
Note: Because we are using a cut-off depth, the computer player is not expected to play perfectly, even in cases where in Part A it would have always won. That being said, if we use a deep enough cut-off depth and a good heuristic, the computer player should be pretty good.