a Connect4 search Algorithm

For the complete Delphi project and source code look [ HERE ]

This document describes how my computer program CONNECT4 (version 4.0)
calculates the best move.
(To download the game, please go to the English homepage of this website)

My work on CONNECT4 started in 2001. The first ideas for an algorithm
came up while working on solutions of Peg-Solitaire.

After the first working example I kept wondering how to speed up the search process
to look further ahead within acceptable time and how to make the computer
a more interesting player.
These additions to the basic algorithm will be described as well.

In the current version 4.0 the search process has two distinct steps:
    1. A qualitative analyses of the board, looking just one move ahead.
    This analyses results in a recommended move for
    2. A quantitative (brute force) approach, which tries to find a better
    move than the one recommended.
The result of step 1. is influenced by the selected Strategy.
In step 2, the search depth is 2*level+1.

The Game
    reduced image:

My version of CONNECT4 is single player, you play against the computer.
The board has 9 columns and 7 rows, resulting in 63 fields.
Computer and player alternately place a ball of their color in a field.
The computer always plays with red, the player with blue.
The columns are filled bottom-up, so the balls are actually dropped down
a column.
The winner is the first to achieve a horizontal,vertical or diagonal line
of 4 balls of his color.

Coding the board and moves
The game has 63 fields, each of which can be can be:
    - empty (code 0)
    - red (code 1)
    - blue (code 2)
Array BORD[1..9,1..7] holds the boardstate.(BORD is Dutch for board as you guessed)
BORD[2,3] := 1 places a red ball in column 2, row 3.
BORD[4,7] := 0 removes the ball of column 4, row 7.
These moves are not visible to the player. The computer is trying moves to
find the best one and only this best move will finally be visible on the screen.

Variable KOLOMHOOGTE[1..9] (dutch for column height) holds the number off balls
already present in a column.

To find the best moves, the computer should do many alternating red and blue
ZET[1..20] holds these moves.(ZET is dutch for MOVE)
A move is a number 1..9, the selected column.
ZET[1] is the first trial move (red, computer),
ZET[2] is the second trial move (blue, player).
If ZET[3] has not been played yet then ZET[3] = 0.

Odd moves are red (computer), even moves are blue (player).

ZET[1..20] , the list of moves, is a 20 digit counter,
where each digit counts 0..9.
For level 2, where the search depth is limited to 5, only ZET[1..5] is used.
By counting from start -00000- to finish -99999- while registrating winning
and losing conditions, the best ZET[1] can be choosen.
Counting is done in the following way:
    - initialize, setting ZET[1..20] to -0- , no move done
    - move 1 (red) : ZET[1] := 1. (place red ball in column 1)
    - move 2 (blue): ZET[2] := 1. (place blue ball on top of red in column 1)
    - move 3 (red) : ZET[3] := 1. (add red ball to column 1)
After each move, the variables BORD and KOLOMHOOGTE have to be updated
to reflect the new boardstate.

In the situation above, we assumed that no winning situation occurred and
column 1 had enough space to hold the balls.
If a column is full, the next column is choosen.
If the number of moves done is the maximum number allowed, the next step is
to take this move back and choose the next column.
Writing a sequence of moves as digits from left to right and indicating
consecutive counts by ... , the way of counting is: (for level 1, search depth=3)

000, 100, 110, 111, 112, 113..119, 120, 121..129, 130, 131..139,......., 900..999
the end of the search.

What is a Node
In mathematics, a Graph consists of a collection of dots, which may be (or not be)
connected by lines.
Graphs are abstract constructions to describe processes like competitions,
industrial planning or games.
A distinct type of graph is the Tree Graph. These graphs have no isolated dots,
travelling from dot -A- to dot -B- and back is only possible by the same lines,
(no loops exist) and removing a single line would isolate part of the dots.
These graphs look like a tree with branches, hence the name.

We use the tree graph to represent the CONNECT4 search process.
A dot is called a NODE.
This node is simply a collection of numbers of which one is the line (= move, columnn)
selected to the next node in the tree.

Playing at higher levels, the tree may have millions of branches, but only one branch
at the time can exist: a row of nodes connected by single lines.

Node 1 holds the first (trial) move of the computer and node 2 is the next move,
representing the player. The highest node is 2*level+1.

A node also holds a number called RATING, which indicates winning or losing conditions.
The higher this rating, the more succesfull the node is.
The general idea behind the search process is: every node tries to obtain the highest
possible rating for itself.

During the search process nodes are added / removed as moves are done or taken back.
An existing node never decreases its rating. As we shall see, only node -1- has to
remember the column for which the highest rating was achieved.

Node 1 has an extra variable called BESTZET. Before the search starts, BESTZET is
set equal to the first move of node 1.
If RATING[1] increases, BESTZET is set to the move (column) of node 1.
At the end of the search, BESTZET is the move the computer will show on the screen.

Winning and Losing
After each move, a test should be made to see if somewhere four balls
of the same color are horizontally, vertically or diagonally connected.
Say, such a situation was found at node 10 (even, a player move). Than node 9
is losing in 1 move, while node 8 is winning in 2 moves.
To win in 3 moves is better then to win in 5 moves. To lose in 2 moves is
worse than losing in 6 moves. With this considerations in mind, the variable
RATING[1..20] was choosen as shown below:

    ratingmeaning for NODE n
    20winning at move n (this node)
    19winning at NODE n+2
    18winning at NODE n+4
    17winning at NODE n+6
    10neutral, no win or lose situation
    3lose in 5 moves = win at NODE n+5
    2lose in 3 moves = win at NODE n+3
    1lose in 1 move = win at NODE n+1
    0no rating available yet
RATING[n] is indicating the success of move n. The higher the better.
Variable RATING[] obtains its value in two ways:
Initially RATING is set to -0- which means that the rating is yet unknown.
When the highest NODE (n) finds no win, RATING[n] is set to 10, the neutral
2. When a win is found at NODE n, RATING[n] is set to 20, the highest rating.

When the ZET[ ] counter = 340, meaning that for move 1 column 3 was choosen,
the next move was in column 4 and move 3 was not yet done, this means that
now node 3 must be added.
When ZET[ ] = 349, node 3 must be removed, next ZET[ ] = 350, node 3 is added again,
ZET[ ] becomes 351,352...etc.

When closing (removing) node n and going back to node n-1, the new rating for
node n-1 is calculated using the following rules:

x := 20 - RATING[n]
if x < 10 then x := x + 1
if x > RATING[n-1] then RATING[n-1] := x

OPEN (adding) a node n includes:
    1. set RATING[n] := 0
    2. select first possible column
    3. test for string of 4 connected
    4. if win: RATING[n] := 20, close node, BACK to previous node
    5. if last node: NEXT move, goto step 3. else OPEN next node
BACK to previous node n-1:
    1. calculate RATING[n-1]
    2. remove old move of node n-1
    3. NEXT move
NEXT move:
    1. ZET[n] := ZET[n]+1
    2. if ZET[n]=10 (all columns done) then close node. If last node: RATING[n]:= 10
    3. if KOLOMHOOGTE[n]=7 then goto step 1
    4. adjust BORD and KOLOMHOOGTE
This describes the search algorithm. The program can be built, the game
However, it is utterley boring. The computer prefers column 1, or else column 2.
The most common rating is 10 and this rating is allready obtained
by move 1 in column 1 in most cases. BESTZET=1 is this case.
Added to that: the search proces is very slow at search depths > 8.

Replacing Drabness by Variation and Surprise
A node performs moves sequenced 1..9. Column 1 is the first selected.
But to make the game more surprising, we choose a random column, say 5,
to start with. Now each node counts the columns 5..9,1..4
When no win is found, BESTZET will be this random column.
Before each search, a new random column to start with is choosen.
Now we observe alternating apparent smart and stupid computer-moves.
This is the way the -RANDOM- strategy works.

Speeding Up
Much time during the search process is spend testing for a line of four balls
of the same color.
As we noticed, BORD[ ] is a 2 dimensional array. Since the computer memory is a
1 dimensional array (of bytes), for each field with column c and row r,
a small calculation is necessary to know the place of BORD[c,r] in the
computer memory.
This calculation may be eliminated by defining a 1 dimensional array BORDX[..]
at the very same memory locations as BORD[ ].
If p = 9*c+r , then BORDX[p] is the same memory location as BORD[c,r].
Adding 9 to p selects the field to the right, subtracting 8 from p selects the
field left below.

When a node wins, this node closes and the search process continues at the
previous node. All branches of a winning node are skipped.
The lower the node, the more time is saved in this case.
So, it is favourable to detect a win as soon as possible during the search.
This is done in the following way:
after a node opens (is added) it tests all columns 1..9 for a possible win
before a move is done and the next node is opened.

Testing for a win can be shortened in another way as well:
instead of adding a ball to a column and testing for a line of 4 balls
of the same color, we calculate if a move in a certain column would
cause a line of 4.
In this way, the move (plus it's removal) is no longer necessary in case 2. above
and in the last node.

While scanning the board for a line of 4 a test must prevent crossing the edges.
However, this test is not necessary if we add edges of "0" around the board.
Reaching an edge now detects a "0" (empty field) and the scanning for that direction
will stop.
So, BORD[1..9,1..7] becomes BORD[0..10,0..8], BORDX[1..63] becomes BORDX[0..98].

In the random strategy, each node counts the columns starting at a random column,
the STARTKOLOM, say 7, so the counting sequence is 7,8,9,1,2,3,4,5,6 in this case.
The search process is speeded up by about a factor 4 if each node's startcolumn is
1 column higher than the previous node's startcolumn.(8,9,1,2,3,4,5,6,7)
A variable STARTKOLOM[ ] is added to each node. The values are calculated at the
start of the search.

All these measures however are marginal compared to the SHORTCUTS.

Shortcut 1
Consider the following case:
The rating of node 4 is equal to 10.
Node 5 just obtained rating 10, so RATING[5] = 10 and node 5 has only done part
of all moves.
Node 5 may select the next move in an effort to increase it's rating. However,
this action is of no influence to the rating of node 4.
If node 5 reaches a rating of say 16, then x := 20-16 + 1, x := 5.
This is less then the rating of node 4, which will therefore not be changed.
So, it is unnecessary for node 5 to perform any more move, opening a next node.
Node 5 can be closed and node 4 continues the search to increase it's rating.
If this (shortcut) takes place in the lower nodes, incredible time may be saved.
Millions of moves are skipped.
This is the most important time saving measure.

Shortcut 2
The two highest (deepest) nodes behave differently.
The last (deepest) node can only have a rating of 20 (win) or 10 (neutral).
The previous node to this thus can have the rating 1 (lose in 1 move) or
10 (neutral).
So, this node must be closed when reaching a rating of 10.

Rating and Treshold
To sense a shortcut1 situation, ratings of the current- and previous node
must be compared.
However, we may "transport" the rating of node n-1 to node n, calling this number
So, a variable TRESHOLD[ ] is added to each node.
After opening node n+1 this happens:
x := 20 - RATING[n]
If x > 10 then x := x + 1
TRESHOLD[n+1] := x
Now, a node is closed (all moves not yet done skipped) if the rating of the
node reaches or exceeds it's treshold.
Opposite to the rating, the treshold of a node may only be decreased.

Node 1 starts with a treshold of 20 and a rating of 1. (only closes on win)
If a node does not detect a win in any of it's columns after opening, a treshold
of 20 is decreased to 19.
The 2 last nodes start with a maximum treshold of 10.
If a node has performed all possible moves, than the treshold is set equal
to the rating only if this causes a lower treshold.

Opening node n:
    x := 20 - TRESHOLD[n-1]
    if x > 10 then x := x + 1
    RATING[n] := x

    x := 20 - RATING[n-1]
    if x > 10 then x := x + 1
    TRESHOLD[n] := x
Closing node n:
    x := 20 - TRESHOLD[n]
    if x < 10 then x := x + 1
    if x > RATING[n-1] then RATING[n-1] := x
Example 1:
(search depth 10, no win found yet)

    node 8 9 10 situation / remarks
    TRESHOLD19100 node 9 did not detect win and opens node 10
    TRESHOLD191010 node 10 did not detect win and closes
    TRESHOLD191010 node 9 closes because treshold = rating
    TRESHOLD19---- node 8 performs next move, opens node 9 etc.
Example 2
(previous win in node 7, node 3 rating indicates win at 4 moves ahead)

    node 3 4 5 situation / remarks
    TRESHOLD19---- open node 4
    TRESHOLD19 2-- open node 5
    RATING18 1--
    TRESHOLD19 220--
    RATING18 119
If node 5 does not detect a win, then it's treshold will be lowered to 19.
Since the treshold is now equal to the rating, node 5 will close.
For node 3, a win in node5 is the only opportunity to increase it's rating,
since earlier a win occurred in node 7.
Again many moves are eliminated.

Reduced Column testing
Immediately after opening, a node tests all the columns for a possible win.
Since a win was not detected in the nodes before, such a win can be the result
    1. the move in the previous node (opposite color)
    2. the move 2 nodes earlier (same color)
So, in node n+2 it is impossible to sense a win in columns that are more than
3 columns apart from the move of node n.
As a result of the move in node n+1, only a win in the same column is possible.

A variable COLCHECK[ ] is added to each node. Bit c of COLCHECK[n] = 1 if
this column needs to be tested for a win.
After the column is checked, the corresponding bit is reset.
Node n sets COLCHECK[n+2] to the right value.
The column selected by the previous node is unconditionally tested for a win.

Recognizing previous board states
Different sequences of moves may result in the same board state.
examples are:
    - 1, 2, 3 and 3, 2, 1
    - 7, 5, 9 and 9, 5, 7
    - 1, 1, 2, 2 and 2, 2, 1, 1
Equal board states will produce equal ratings.
When a board state reappears, its rating may be remembered and subsequent
moves in the search tree may be skipped.
A problem however is that recognizing previous board states takes much
processor time.
This time must be subtracted from the time saved. Because of shortcuts, the
actual time saved is somewhat uncertain.
Therefore, limitations are incorporated to keep things simple.

Consider 3 consecutive moves. There are 9*9*9 = 729 of such sequences.
Identical board states are produced by sequences of the type 1,2,3 ... 3,2,1
or in general : a,b,c .......c,b,a.
Of this type , 9*8*7/2 = 252 sequences have twin board states.
Thus, in the most favourable situation, we may save 252 moves (and remainder of tree) out of 729,
which is 35%. This may occur in each node >= 3.

Now consider 4 consecutive moves.
All sequences leading to a duplicate board state have the form 1,1,2,2 ..... 2,2,1,1
in general a,a,b,b ..........b,b,a,a
Every other node may detect such a situation.
There are 9*9*9*9 = 6561 sequences of 4 moves of which 9*8/2 = 36 may cause
identical board states. We will not waste time on sequences of 4 moves.

So, here is the method used by CONNECT4 version 4.0 to recognize previous board states
considering three consecutive moves only.

First a description of the variables.
Each node, starting at 3, has a variable Q3MapAddr which holds the hexadecimally coded
3 consecutive moves.
If zet[5] = 3, zet[6] = 2 and zet[7] = 9 then Q3MapAddr[7] = $0329.

Table Q3Map[$111 .. $999] has an entry for every possible sequence of 3 moves.
The table is built when the game is opened first.
If a sequence of three moves is of type a,b,c then Q3Map[$abc] and Q3Map[cba] are set to a same
unique number in the range 1..252.
Other entries of Q3Map are set to zero.

Each node has a table, QReg, of 252 entries of the following record:
    lock : LongInt;
    score : byte;
The score field of node[n] holds the rating which is calculated when node[n+1] closes,
if the moves of nodes n-2,n-1,n where of type a,b,c.

Each node also has a variable Q3Key, which is a LongInt integer.
Q3Key[n] is stored in Q3Reg[n,i].lock when node[n] sets the score field of Q3Reg[n,i].

The key and lock variables form a validation mechanism.
On a new move of node p, Q3key[p+3] is incremented so all Q3 entries for node p+3 are invalidated.

To register a rating for future reference, following actions take place:
    node[n+1] closes and passes a new rating to node[n]
    node[n] uses it's Q3MapAddr as an address to Q3Map.
    If a nonzero value i is red, this value is used as an address for QReg of node[n] and
    the rating and Q3Key values are written in Q3Reg[n,i].score and Q3Reg[n,i].lock.
To recognize and use a previous board state and it's rating :
    node[n] advances to a new move.
    Q3MapAddr is updated to reflect the new sequence of three moves.
    i = Q3Map[Q3MappAddr]] is red and checked for nonzero.
    If so, Q3Reg[n,i].lock is compared to Q3Key.
    If compare (valid entry) then Q3Reg[n,i].score is compared to the current rating of node[n].
    If the current rating is equal or higher, node[n] skips the move and advances to the next move.
Measurements show, that over 99% of cases where the key and lock values compare,
a move (and subsequent tree branches) are skipped.

This concludes the description of the -brute force- search process.

The analyse button
With the analyze button pressed, the computer shows a prediction
for each column choosen next by the player.
To accomplish this, in a table the tresholds for each column of node 3 are stored.
Node 3 must be disabled to make shortcuts, which is done by setting TRESHOLD[3]=20
upon opening of the node, regardless of the rating of node 2.
The search process is slowed down about 2.5 times.

When strategy 0 is selected, the search process is as described before.
If no win is detected, seemingly smart and stupid computer-moves are noticed.
This generates vivid games, but the moves lack cohesion.
"Strategy" is added to give the computer a distinct style for the price of
a more predictable response.

When selecting a strategy > 0 , the computer starts by a trial move in a column
followed by an analyses of the board state. The result is a score.
The column with the highest score is presented to the -brute force- quantitative
search as STARTKOLOM[1].
The quantitative search tries to find a better move then this one.
If no win is found, BESTZET = STARTKOLOM[1], the actual computermove.

A different strategy uses a different algorithm to calculate the score.
The following is done:
A possible line of 4 balls on the board is called a street.
A board of 9 columns and 7 rows has 126 streets (just count!)
A type is assigned to a street:
    0 : empty street, no ball placed yet
    1 : only some red balls, computer may win later
    2 : only some blue balls, player may win later
    3 : red and blue balls, no win possible
A (street)value is generated for each street.
The streetvalue is simply the number of balls (0..3) placed.

For each empty field, 2 values are calculated:
    fieldvalue 1 : sum of streetvalues of all type -1- streets where field is part of
    fieldvalue 2 : sum of streetvalues of all type -2- streets where field is part of
The strategy is a 3 bit number. The contribution of each bit to the score is:

    xx1 : score := score + 2 * sum of all streetvalues of type 1 streets
    x1x : score := score - 2 * sum of all streetvalues of type 2 streets
    1xx : increase value of a field with value of field just above
    for each field on distance 2: score := score + 3 * fieldvalue
    1x1 : score := score + sum of all fieldvalues 1
    11x : score := score - sum of all fieldvalues 2
Style and strategy:
    0 : random
    1 : expansive
    2 : restrictive
    3 : neutral
    4 : tactical
    5 : tactical, expansive
    6 : tactical, restrictive
    7 : tactical, neutral
The influence of the strategy on the game decreases at higher levels.

Summary of variables

    n byte, indicate number of node
    BORD[0..10][0..8] array of 11*9 bytes, holds board state, borders are -0-
    KOLOMHOOGTE[1..9] number of balls in this column
    STARTKOLOM[1..20] first trial move for node 1 .. 20
    ZET[1..20] byte, the move of node 1 .. 20
    BESTZETbyte, holds best move of node 1
    RATING[1..20] current rating of node 1 .. 20
    TRESHOLD[1..20] highest rating possible for node 1 .. 20
    COLCHECK[1..20] 16 bit word, bits 1..9 indicate columns to be tested for win
    Q3MAP[$111..$999] 729 entries, 1 for each -3 moves- sequence.
    Q3MAPADDR[1..20] 16 bit word, hex encoded last 3 moves of nodes n-2, n-1, n
    is index into Q3Map
    Q3REG[1..20,1..252] record: lock,rating. 252 entries per node
    Q3Key[1..20] long (32 bit) integer, validation number for each node

This concludes the description of my CONNECT4 search algorithm.