Pointy Stone 3 Othello by Jonathan Kreuzer
Mainpage (with download) | Using PS 3 | The PS 3 Brain | The Future of PS 3 | Othello Program links | Go to 3DKingdoms.com |
Pointy Stone 3 was programmed for Win95/XP using Microsoft Visual C/C++.
The win32 SDK was used for windowing functions, and winsocks for network functions. --Playing Strength: If you're fairly new to Othello you'll probably think Pointy plays quite well, even on lower difficulty levels. I'm a pretty good Othello player, and find that 4 ply usually beats me in a 2 minute speed game, however its a good challenge. 8 ply is almost impossible for me to beat in an honest game (no take backs, no evaluation info, etc,) but it has happened.
Pointy will destroy a lot of other Othello programs, but
this is because there are a lot of Othello programs out there that aren't
that strong. Pointy still plays below the strength of the best programs out
there, like WZebra. Pointy is getting better though, and I suspect if I get around
to working more on Pointy, he will reach the level of the top programs.
Evaluation Features--Opening Book: The Opening Book is fairly small, and also I made the mistake of not accounting for transpositions (ie. a position must be re-entered if it is reached through a different string of moves, except the two common ways of reaching the Tiger are accounted for.) It does account for symmetry. The Opening Book is meant to provide some diversity in gameplay, make Pointy play common openings and continuations, and help out his opening ability playing these openings, especially on lower difficulty levels. At your own risk, you can add positions to the opening book yourself by typing 1, 2, 3, 4, or 5 during the game... 1 means the position is good for black, 2 somewhat better for black, 3 about even, 4 somewhat better for white, 5 good for white. 6 will remove a position from the opening book. Positions added to the book must be within the first 36 moves of the game. Low Randomness: Point will play the best valued move in the book (unless it's a -2 for him, then he will do a midgame search). Pointy will randomly choose between moves in the book with the same value. 10-ply and above default to this behavior. Medium Randomness: Point will play either a move of the best value, or 1 away from the best value. This should give greater variety for human play, and the positions should all be decent for human competetion. (Top programs are likely to be able to win from many -1 positions every time though.) --Number of Markers: Pointy will check to make sure he can't be wiped out within the depth of his lookahead. --Potential Mobility/ Mobility: All Pointy knows about Mobility is now included in pattern values. --Stability Estimations: All Pointy knows about Stability is now included in the pattern values. --Patterns: These are the main evaluation features. To learn values for the patterns, 1.3 million training positions were generated and scored, then gradient descent was used to give values to the pattern weights. Pointy completed this process computing on and off over the course of a couple days, with most of the time being spent on scoring positions. The 1.3 million training positions (split over 7 stages) isn't enough for best results. Since many parts of this process were almost automatic, I hope to fully automate this and leave it running for a few days ( I think 10.3 million would be a good number.) Learned Patterns:
3x3 Corner Patterns ( 9 discs , occurs 4 times on board)
Edge + 2X Patterns ( 10 discs , occurs 4 times on board) and 8 Horizontal/Vertical slices (8 discs) 18 Diagonals of length 4 to 8 Each of these patterns, when added, allowed Pointy to beat the version without the pattern in a self play tournament. None of these choices for basic patterns to use are original to Pointy Stone. However I did implement a few original ideas I think, and hope to try out more if I have time. I didn't have enough time to tune my learning algorithm decently. Patterns Pointy has not encountered or encountered only very rarely in his studies get neutral or near neutral values, which probably throws off the evaluation somewhat. Others are "over-fitted": rare but not rare enough to get neutral values, so instead they crazy values that best fit the training positions, but are bad in the actual game. One thing I added to the patterns were flags. (For example, the 3x3 corner pattern includes a flag on whether or not the the corner can be taken along the diagonal, and the stability of the edge squares 2 away from the corner. The 2x5 corner contains a flag telling whether a move to the C square is possible.) These do slow the evaluation down, but only very slightly. It seems natural that well-chosen flags can lead to better pattern values. After all each square in the pattern is a flag giving the black/white/empty status of that square, but important concepts for areas of the board include more than just square values. Because of the rush job on learning, I didn't have enough training positions to get decent values for flagged patterns. A better way would have been to compute the pattern values without the flags, then put this value in for all flag states. Then for patterns that occur often enough use gradient descent to set the values as before. This way if a pattern & flag combination isn't represented enough by the training positions, it still probably will get a decent value from just the pattern. I have a few other ideas for interesting things to implement that perhaps I'll create a section about, but I feel I should study / tune what I have better first. Search
Speed:
in the midgame he searches about 1100 KiloNodes/second (low: 900 Kn/s, high:
1400 Kn/s) Every board reached in the search is counted as a node. (Evaluated positions, and all those leading up to them. Basically the node counter increments after the Do Move function.) I have some ideas for some more speed optimization, but the speed is fast enough that more optimization is not the highest thing on my priority list. The Endgame search was about 1000 Kn/s faster but tweaking to order the moves better slowed it down (and let it search 1/4th as many nodes as before.) The Midgame was about 300 Kn/s faster at one time too, but it had less othello knowledge and worse move ordering. Pointy Stone 3 uses minimax search enhanced by: Alpha-Beta: Alpha-Beta is a requirement to reach decent search depth. It allows Pointy to search 8 ply in the time it would take him to search 5 ply without alpha-beta. Always Searching the best move first at each level will lead to optimal alpha-beta searches, but if the best move was known, we wouldn't need to search. So it's good to approximate it. * Shallow Searches : (usually 1 ply) to search better moves first. * Transposition Table : This stores information about posititions previously ecountered in the search. This helps reduce # of nodes searched somewhat for the midgame, and greatly for the endgame. * Iterative Deepening : Combined with a Transposition table this can give better move ordering, and is needed for timed games so when Pointy runs out of time to search to a certain depth, he just takes the best move from the previous depth. * Modified History Heuristic : When it would take too long to use a shallow search, it sorts by values of the move squares taken from the last time a shallow search was used for the player to move. Also, on the last ply of the midgame search, the best move square from this heuristic is tested for a cutoff first before generating all available moves. It doesn't reduce the number of nodes much, but it helps noticeably with speed. --Win/Loss: On higher Difficulty levels Pointy will perform a Win/Loss search, and if he has a winning move play it (or play a drawing move if he has one and no winning move.) A Win/Loss search is quicker to perform than a perfect endgame search because Pointy's moves don't have to be proved optimal (just winning is good enough). (Or the other way around if he's losing.) If his opponent has very poor mobility and thus few moves, the win/loss search probably will be quite fast. For instance in an extremely uneven position, I once ran a 34 ply win/loss search in 1 second. Of course in obviously won positions, win/loss searches are pointless. In more even positions you probably don't want to search this way from more than 24 ply out, or 26-28 ply out if you don't mind the possiblity of a long wait. --Perfect Endgame : This is exactly what it sounds like... Pointy will play perfectly when there are 8 to 20 empty squares, (depending on difficulty level.) To do this he will look at all possible variations until neither player can move. This search is done by performing a win/loss search, then homing in on the perfect value. In most positions (especially very uneven ones) this can take much longer than a win/loss search. It's usually best not to run it from more than 22 ply out if you don't want a long wait. |
Page by: Jon Kreuzer.