CS 152 Final Project :: An Examination of a Cribbage playing Neural Network

An Examination of a Cribbage playing Neural Network

Corey Hebert

The objective of this project was to create a neural network that would be able to suggest moves for the game of cribbage or to play the game "competently" in some fashion. Secondarily, the hope was to find, out of the set of networks that played competently, a network that was better in terms of training time or average score per hand or some other metric. To simplify the project, I began by focusing simply upon the discarding aspect of the game, as this is the primary source of points in the game of Cribbage, I had seen other papers where authors had success with this aspect of the game, and it seemed to be a tractable and appropriate problem.

To achieve the above stated goal, I began by creating a general function which would accept various parameters such as the style of network to use the particular parameters for initializing the network, and the number of training cycles to use. Then, I created a script that ran many different combinations of parameters through this general function, captured their outputs, and saved the results for examination to determine if any of the goals were met.

I called my general function discard_plot.m (Matlab source code below) as I was focusing upon the discarding aspect of the game, and built handles into the code to evaluate progress and enable plotting of average scores throughout the training cycles. The code began by initializing a network of the specified type, then generating a hand to by played by the network and its opponent (a player playing randomly). This hand was a 52-element column vector in which the cards were considered to be ordered, with the first 13 being one suit (ace through king) the next 13 being the next suit and so on. Since in cribbage the exact suit of a card never matters, just if cards are of the same suit, it is immaterial in which order the suits are in the vector. The player's discard choices were then determined by passing the hand vector to the network or other generator which decided which cards should be discarded. This output was a 6-element column vector which presumed an ordering of the hand (top to bottom of the hand vector) and had four 1's indicating the cards that were to be kept. In many cases, the network output needed to be interpreted and it was assumed that the lowest two values should be discarded. At this point it was known which cards had be "dealt" to the players, and another function was called to determine the cut card from the remaining cards in the "deck". This function, cut.m, simply determines a random card, checks to see if it has already been dealt, and if not, returns it as the cut card, and if so, it tries again. With a hand and cut card, the score for a hand could be determined by passing these to the score.m function. Then, but comparing the score values for the two players, who had played the same hand, we could determine if the network player played better and reinforce accordinly. The process of generating a hand, deciding discards, generating a cut card, scoring hands and training was iterated many times in hopes the network would ultimately be able to consistently outperform a random player and, in this particular definition, be considered competent.

discard_plot.m

cut.m

score.m

To evaluate the success of the above method, many trials were run varying the parameters of the networks and the length of training. These included using the built-in matlab networks newff, newcf, newpnn, and newrb, several network sizes varying from tens of nodes to hundreds of nodes and included varying the size of any hidden layers of the network, using several different transfer functions including logsig and purelin, using several different network training functions including trainlm, trainbfg, and trainrp, and lastly varying the number of iterations used to train the network from just a few up to tens of thousands. Ultimately though, all of these trials failed to develop a network that outperformed the random player. There was not even an upward trend observed in the average scores of the network player which might have indicated that with additional training time the network could develop to a point where it consistently was playing at a higher level. At this point, a solution was coded that used a genetic algorithm and our representation of hands and discard choices as this was a solution that other authors had been observed to have success with. The hope was that a success with this method would validate the general approach and encourage examination of the other networks in hopes that a solution could be found to make them similarly successful. Unfortuately though, through all the trials with the genetic algorithm, it also failed to outperform the random player.

genetic.m

Ultimately, the project taught several lessons and inspired some places for further examination. The first, and largest, thing observed when running the trials was the amount of time and space necessary for training. Several of the network training functions are very space intensive and this limits the size of networks that can be used with some training algorithms. Additionally, the apparent necessity for training for tens of thousands of iterations or longer means that training a network may take on the order of days (as in fact some trials did take this long to complete). One future avenue for improvement would be to parallelize the problem as much as possible, perhaps through the use of significant batch training. Certainly many iterations would still be necessary, and large networks may still be difficult to train, but this would enable the network to see a greater number of suggested inputs and outputs through a fashion that could certainly benefit greatly from parallel computation. Additionally, it seems probable that exploring the full problem space (millions of possible 6-card hands) was a bit too optimistic for this project. In order to fully develop a network that could generalize sufficiently well across this space, one might need to leverage larger networks or train for many hundreds or millions of iterations more. Alteratively, one might wish to attack a simpler problem as some other authors were seen to do and then perhaps extend that solution to larger portions of the problem space.

Some sample trials using discard_plot.m

Copy of final presentation