Hi all, I am new to neural networks and Pytorch and have a problem that I hope someone can help me with.
After trying the standard MNIST digit problem, I have been working on this chess-positions dataset Chess Positions | Kaggle
First, I trained a CNN to identify chess pieces on individual squares using a balanced dataset of each peice type (incl empty spaces). This got me up to 99% accuracy but that is not sufficient to robustly identify all the pieces on a chess board as 0.99^64=0.53.
I improved things by submitting each board as a batch of 64 images of the individual squares from the same board. In this way, I reasoned that that the network could better distinguish distinguish pieces, especially light from dark, because each batch has the same board style (the dataset contains a mix of both chess piece styles and chess board styles). This worked and I got to 0.9999 accurary on individual squares which gave the expected 0.9999^64=0.99 accuracy on entire boards.
However to play around further I want to make a network that takes in the entire board but splits the input into the individual squares, runs them all through the same convolutional layers and then uses one or two final linear layers to combine the outputs together and correct for any errors due to the differerent light/dark backgrounds.
This is the model
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)
class Net2(nn.Module):
def __init__(self,device):
super(Net2, self).__init__()
self.device=device
self.conv=nn.Sequential(
nn.Conv2d(3,32,5),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(32,64,5),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Dropout2d(0.25),
nn.Flatten(),
nn.Linear(5184,128),
nn.ReLU(),
nn.Dropout(0.5)
)
# fully connected layer that outputs the logits for our 13 labels for each of the 64 squares
self.fc2 = nn.Linear(128*64, 13*64)
def forward(self, x):
rows=torch.split(x,50,dim=2)#tuple of 8 Nx3x50x500 tensors
out=torch.empty((rows[0].shape[0],128*64),dtype=torch.float32,device=self.device)
for i in range(8):
squares=torch.split(rows[i],50,dim=3)#tuple of 8 Nx3x50x50 tensors
for j in range(8):
s=8*i+j
out[:,s*128:(s+1)*128]=self.conv(squares[j])
out = self.fc2(out)
output=out.reshape(-1,13,64)
return output
my_nn2 = Net2(device)
print(my_nn2)
and I use
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(my_nn2.parameters(), lr=0.001, momentum=0.9)
I train it like this
n_epochs=10
accuracy_train=np.zeros(n_epochs,)
batch_size=100
N_samples=x_train.shape[0]
all_predictions=np.ndarray((N_samples,64),dtype='int64')
for epoch in range(n_epochs): # loop over the dataset multiple times
running_loss = 0.0
for i in range(0,N_samples,batch_size):
batch=torch.tensor(x_train[i:(i+batch_size),:,:,:],dtype=torch.float32).to(device)/255
labels=torch.tensor(y_train[i:(i+batch_size),:],dtype=torch.long).to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = my_nn2(batch)#batch_sizex13x64
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i% 2000 == 0:
print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
#after each epoch, run again on the training data
with torch.no_grad():
for i in range(0,N_samples,batch_size):
batch=torch.tensor(x_train[i:(i+batch_size),:,:,:],dtype=torch.float32).to(device)/255
outputs2 = my_nn2(batch)
_, batch_predictions = torch.max(outputs2.data, 1)
all_predictions[i:(i+batch_size),:]=batch_predictions.cpu().numpy()
accuracy_train[epoch]=sum(np.all(y_train==all_predictions,axis=1))/x_train.shape[0]
print(f'[{epoch + 1}, {i + 1:5d}] training accuracy: {accuracy_train[epoch]}')
print('Finished Training')
The labels consist of 13 integers representing the light and dark pieces and an empty space.
My problem is that training does not seem to be working. The model converges to predicting entirely empty boards. There are only 5-15 pieces per boards so the majority of squares are empty but this was not a problem with my previous network that took in individual squares.
Does anyone have any suggests or see any mistakes in my code?
Thanks
Sean