Tic-Tac-Toe and AI: A Winning Board (Part 2)

As a first learning task for using AI with Tic-Tac-Toe, let us take the task of determining whether a board is a winning one or not (i.e. either X or O has won the game). We cannot directly tell the neural network what the rules are for an assignment to be winning. Instead, we need to train it by examples. For that we already have prepared some data in a previous post.

The idea is a to train a TensorFlow model with several Dense layers. By varying the model’s configuration, we want to determine how complex (e.g. how many parameters we need) such a model needs to be to fulfill this requirement. Also looking at the accuracy will be an interesting topic.

As an information in advance: The computation were done using TensorFlow 2.13.1 on a Windows WSL2 machine having an NVIDIA Geforce RTX 4060 (8GB) installed. Mixed Precision was not enabled.

As usual you may download the entire example using the following link:

  tictactoetf.zip (9.6 KiB, 274 hits)

Let’s get started…

[continued on the next page]

Reading Data for Training and Evaluation

First of all, we need to read the data for training and evaluation. In the preparation blog post, we already created the test data file tictactoe_valid.txt for that. We will reuse the data loader for further cases as well, that is why we build it a little generic using Pyrecords.

import gzip
from pyrecord import Record # https://pythonhosted.org/pyrecord/

import itertools
TTTRecord = Record.create_type("TTTRecord", "vector", "valid", "winning", "winner", "move")
def tttRecordGenerator():
    with open("tictactoe_valid.txt", "rt") as f:
        line = f.readline()
        while line:
            # print (line)
            vector = [int(line[0]), int(line[1]), int(line[2]), int(line[3]), int(line[4]), int(line[5]), int(line[6]), int(line[7]), int(line[8])]
            valid = line[10] == "1"
            winning = int(line[11])
            winner = int(line[12])
            move = int(line[13])
            record = TTTRecord(vector, valid, winning, winner, move)
            yield record
            line = f.readline()
validtttRecords = filter(lambda o : o.valid, tttRecordGenerator())
# Warning: will take a couple of seconds!
validtttRecordsList = list(validtttRecords)
len(validtttRecordsList)

Having set up TensorFlow with

import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

import pandas as pd
import numpy as np
import datetime

we then transform or Pyrecord data into a Pandas DataFrame.

allDataFrame = pd.DataFrame(list(zip([x.vector[0] for x in validtttRecordsList], 
                                     [x.vector[1] for x in validtttRecordsList],
                                     [x.vector[2] for x in validtttRecordsList],
                                     [x.vector[3] for x in validtttRecordsList],
                                     [x.vector[4] for x in validtttRecordsList],
                                     [x.vector[5] for x in validtttRecordsList],
                                     [x.vector[6] for x in validtttRecordsList],
                                     [x.vector[7] for x in validtttRecordsList],
                                     [x.vector[8] for x in validtttRecordsList],
                                     [x.winning for x in validtttRecordsList])), 
             columns =['pos1', 'pos2', 'pos3','pos4','pos5','pos6','pos7','pos8','pos9', 'winning'])

print(allDataFrame.tail())

This gives us a first glimpse of the data:

        pos1  pos2  pos3  pos4  pos5  pos6  pos7  pos8  pos9  winning
362875     9     8     7     6     5     4     1     3     2        1
362876     9     8     7     6     5     4     2     1     3        0
362877     9     8     7     6     5     4     2     3     1        0
362878     9     8     7     6     5     4     3     1     2        1
362879     9     8     7     6     5     4     3     2     1        1

Essentially, what we are doing is that we unpack the vector into columns (this will be our features later on), having it located next to the winning information (which will be our labels later).

Training (Basic Model)

For training, we take the usual 80% random cut. The remainder will serve as test for evaluation later. Moreover, we take the usual statistical information for our training set.

train_dataset = allDataFrame.sample(frac=0.8, random_state=42)
test_dataset = allDataFrame.drop(train_dataset.index)

print(allDataFrame.shape, train_dataset.shape, test_dataset.shape)
train_dataset.describe().transpose()
(362880, 10) (290304, 10) (72576, 10)
countmeanstdmin25%50%75%max
pos1290304.05.0040372.5825161.03.05.07.09.0
pos2290304.04.9992492.5807501.03.05.07.09.0
pos3290304.04.9960632.5805381.03.05.07.09.0
pos4290304.05.0030142.5817811.03.05.07.09.0
pos5290304.04.9965792.5822231.03.05.07.09.0
pos6290304.05.0002792.5817731.03.05.07.09.0
pos7290304.04.9990292.5832151.03.05.07.09.0
pos8290304.04.9985052.5814891.03.05.07.09.0
pos9290304.05.0032452.5836421.03.05.07.09.0
winning290304.00.4486920.4973610.00.00.01.01.0

We can easily see that there is a uniform distribution on all positions, and that the winning value behaves like a boolean for categorization.

For easier access, we can now split features and labels:

# split features from labels
train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('winning')
test_labels = test_features.pop('winning')

Obviously our features are not normalized yet. That is why we prepare a Keras Normalization Layer.

normalizer = preprocessing.Normalization()
normalizer.adapt(np.array(train_features))
print(normalizer.mean.numpy())
[[5.004032  4.9992476 4.9960785 5.0030107 4.9965568 5.0002723 4.999036
  4.9984956 5.0032406]]

Soon we will make this the first layer of our model.

Apropos talking about the model: For a starter, let’s take the following model:

model = keras.models.Sequential([
    normalizer,
    layers.Dense(units=64, activation='relu'), #1
    layers.Dense(units=64,activation='relu'), #2 
    layers.Dense(units=128,activation='relu'), #3
    layers.Dense(units=1)
])
print(model.summary())
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 normalization (Normalizati  (None, 9)                 19        
 on)                                                             
                                                                 
 dense (Dense)               (None, 64)                640       
                                                                 
 dense_1 (Dense)             (None, 64)                4160      
                                                                 
 dense_2 (Dense)             (None, 128)               8320      
                                                                 
 dense_3 (Dense)             (None, 1)                 129       
                                                                 
=================================================================
Total params: 13268 (51.83 KB)
Trainable params: 13249 (51.75 KB)
Non-trainable params: 19 (80.00 Byte)
_________________________________________________________________
None

As you can see, after the normalizer, we have two Dense layers with 64 units each, followed by a Dense layer with 128 units. As we want to have a single boolean-like result (“winning or not”), the result layer is a Dense layer with a single unit. As we go with the logit approach, there is no activation function on the result layer. Let’s see how far we get with this.

To ensure that we don’t overfit our model, let’s make sure that we will stop fitting at latest, if we have an accuracy of 1.0. For that we define a brief custom fitting callback:

class Accuracy1Stopping(keras.callbacks.Callback):
    def __init():
        super.__init__()

    def on_epoch_end(self, epoch, logs=None):
        if round(logs.get('accuracy'), 4) == 1.0:
            self.model.stop_training = True

Then let’s compile and fit the model right away:

model.compile(loss=keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=keras.optimizers.Adam(learning_rate=0.01), 
              metrics = ["accuracy"])
history = model.fit(train_features, train_labels, 
          batch_size=512, 
          epochs=20, 
          shuffle=True,
          callbacks=[
              tf.keras.callbacks.EarlyStopping(monitor='accuracy', mode="max", restore_best_weights=True, patience=5, verbose=1), 
              Accuracy1Stopping(),
              tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.2, patience=2, min_lr=0.002)
          ], 
          verbose=1)
Epoch 1/50
567/567 [==============================] - 6s 7ms/step - loss: 0.5521 - accuracy: 0.6821 - lr: 0.0100
Epoch 2/50
567/567 [==============================] - 3s 5ms/step - loss: 0.3881 - accuracy: 0.7986 - lr: 0.0100
Epoch 3/50
567/567 [==============================] - 3s 5ms/step - loss: 0.2644 - accuracy: 0.8709 - lr: 0.0100
Epoch 4/50
567/567 [==============================] - 3s 5ms/step - loss: 0.1828 - accuracy: 0.9161 - lr: 0.0100
Epoch 5/50
567/567 [==============================] - 3s 5ms/step - loss: 0.1332 - accuracy: 0.9435 - lr: 0.0100
Epoch 6/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0902 - accuracy: 0.9631 - lr: 0.0100
Epoch 7/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0713 - accuracy: 0.9716 - lr: 0.0100
Epoch 8/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0696 - accuracy: 0.9740 - lr: 0.0100
Epoch 9/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0528 - accuracy: 0.9808 - lr: 0.0100
Epoch 10/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0445 - accuracy: 0.9841 - lr: 0.0100
Epoch 11/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0496 - accuracy: 0.9836 - lr: 0.0100
Epoch 12/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0285 - accuracy: 0.9907 - lr: 0.0100
Epoch 13/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0433 - accuracy: 0.9860 - lr: 0.0100
Epoch 14/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0181 - accuracy: 0.9955 - lr: 0.0100
Epoch 15/50
567/567 [==============================] - 3s 6ms/step - loss: 0.0552 - accuracy: 0.9852 - lr: 0.0100
Epoch 16/50
567/567 [==============================] - 3s 6ms/step - loss: 0.0022 - accuracy: 0.9999 - lr: 0.0100
Epoch 17/50
567/567 [==============================] - 3s 6ms/step - loss: 0.0032 - accuracy: 0.9995 - lr: 0.0100
Epoch 18/50
567/567 [==============================] - 3s 6ms/step - loss: 0.0914 - accuracy: 0.9779 - lr: 0.0100
Epoch 19/50
567/567 [==============================] - 3s 5ms/step - loss: 0.0030 - accuracy: 0.9999 - lr: 0.0020
Epoch 20/50
567/567 [==============================] - 3s 6ms/step - loss: 0.0022 - accuracy: 1.0000 - lr: 0.0020

Note that the training was through in less than 63s – and we achieved a mind-blowing accuracy of 1.0 after already 20 epochs!

Evaluation (Basic Model)

Let’s check whether the model has not tried to trick us and use the test dataset:

evaluationResult = model.evaluate(test_features, test_labels, batch_size=256, verbose=1)
print(evaluationResult)
284/284 [==============================] - 4s 10ms/step - loss: 0.0021 - accuracy: 1.0000
[0.002135399729013443, 1.0]

Also for test dataset, the accuracy is 1.0! That impressively confirms that the neural network was able to learn what a winning board is.

BTW: This Keras neural network can be downloaded here.

  tic-tac-toe-Winning-Model.zip (134.8 KiB, 161 hits)

Sensitive Analysis

Now, let’s play around a little with the model and see how sensitive our model configuration is. For that, we automate the evaluation in a function, which reads like this:

import time
def runTrainingAndMeasureTestAccuracy(model, learning_rate): 
    model.compile(loss=keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer=keras.optimizers.Adam(learning_rate=learning_rate), 
              metrics = ["accuracy"])
    start = time.time()
    history = model.fit(train_features, train_labels, 
          batch_size=512, 
          epochs=50, 
          shuffle=True, 
          callbacks=[
              tf.keras.callbacks.EarlyStopping(monitor='accuracy', mode="max", patience=5, verbose=1), 
              Accuracy1Stopping(),
              tf.keras.callbacks.ReduceLROnPlateau(monitor='loss', factor=0.2, patience=2, min_lr=0.002)
          ],
          verbose=1)
    stop = time.time()
    print(f"Elapsed Training time: {stop-start}")

    print("Evaluating...")
    evaluationResult = model.evaluate(test_features, test_labels, batch_size=256, verbose=1)
    print(evaluationResult)

The training “loop” then itself calls this method like this:

model = keras.models.Sequential([
     normalizer,
     # here goes the model definition.
     layers.Dense(units=1)
])
runTrainingAndMeasureTestAccuracy(model, 0.01)

Running a set of measurements, this results in the following data points:

Learning RateDense Layer 1Dense Layer 2Dense Layer 3Dense Layer 4Dense Layer 5Dense Layer 6Dense Layer 7Dense Layer 8Dense Layer 9EpochsTraining AccuracyTest AccuracyTraining Duration [s]Avg Training Time/Epoch
0,0164641282710,9999441,51,54 s
0,016464128181128,51,58 s
0,016464128151123,31,55 s
0,0132321283410,999750,61,49 s
0,013232128500,99630,995777,71,55 s
0,013232128500,99340,996674,61,49 s
0,01128128128121120,91,74 s
0,011281281282010,999941,42,07 s
0,01128128128911151,67 s
0,011281281610,999922,71,42 s
0,01128128380,99930,999351,81,36 s
0,01128128500,99980,999871,61,43 s
0,01256128500,99610,996969,81,40 s
0,012561284410,999982,31,87 s
0,012561283411491,44 s
0,01128128641310,999920,71,59 s
0,0112812864111117,71,61 s
0,0112812864161126,11,63 s
0,01384500,95030,947791,71,83 s
0,01384500,94720,9458108,22,16 s
0,01384500,94170,941493,41,87 s
0,011024500,95060,942964,81,30 s
0,011024500,95560,960382,21,64 s
0,011024500,95130,955465,91,32 s
0,01323232323232400,99970,999880,42,01 s
0,01323232323232231145,61,98 s
0,01323232323232301159,11,97 s
0,01161616161616161616500,98080,9591133,52,67 s
0,01161616161616161616500,94780,9506137,32,75 s
0,01161616161616161616470,93610,9417118,62,52 s

It is to be admitted that this small series of measurement are way too small to allow concluding general statements, but the following hypothesizes may appear worthwhile having a closer look at:

  • More nodes in the layers does not automatically mean better results.
  • Also the number of layers do not yield better results automatically.
  • Reduction of the number of nodes “in the base layers” may mean slower training progress.
  • There seems to be a lower limit of units in the dense layer that are necessary to achieve an accuracy value of 1.
  • Reducing the number of hidden layers below 2 does not allow anymore to reach an accuracy value of 1 – even if the number of units is increased drastically.
  • A higher learning rate (0.01 -> 0.05) may lead to training problems.

By looking at which neural networks were able to achieve an accuracy of 1, and mentally comparing those, then it two more aspects become apparent:

  • Models, which consist of only layers with 32 nodes, were able to achieve an accuracy of 1.
  • A large number of layers with only small amounts of units (16) does not guarantee to also achieve an accuracy value of 1. Moreover, models with many layers take longer to train.

In short, there seems to be an optimum between number of layers and units in the layers: two to three layers seem to be most efficient, plus going below 64 units per Dense layer also is not very promising.

Perhaps with a later blog post we may further analyze these aspects.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

3 Comments

  1. Pingback: Tic-Tac-Toe and AI: Who is the Winner? | Nico's Blog

  2. Pingback: Tic-Tac-Toe and AI: And what about the Winning Move? | Nico's Blog

  3. Pingback: Tic-Tac-Toe: Wrapping Up the Four Models - Multi-Output | Nico's Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

*