Gradients not being set for pytorch model

I am trying to run the following code for a neural network chess evaluation function. The dataset appears to be correct, but when I try to train this model, the gradients are always zero. Not sure where I am going wrong.

Thanks for any help!

class EvaulationFunction(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(64*12+6, 70)
        self.fc2 = nn.Linear(70, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


data = pd.read_csv("utils/data.csv", header=None)
torch_data = torch.tensor(data.values)
train_len = (int(len(torch_data)*.9)//16)*16
test_len = len(torch_data)-train_len
train, test = torch.utils.data.random_split(torch_data, [train_len, test_len])
trainloader = torch.utils.data.DataLoader(train, batch_size=16)
testloader = torch.utils.data.DataLoader(test, batch_size=16)

evf = EvaulationFunction()

criterion = nn.L1Loss()
optimizer = optim.SGD(evf.parameters(), lr=.1, momentum=0.9)
for epoch in range(10):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs = data[:,:-1].type(torch.float32)
        labels = data[:,-1].type(torch.float32)
        print(inputs)
        print(labels)
        # zero the parameter gradients
        optimizer.zero_grad()
        # forward + backward + optimize
        outputs = evf(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()


        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0
            #print(list(evf.parameters()))

torch.save(evf.state_dict(), "model.pt")

Your code seems to work fine for me and the gradients contain non-zero values:

class EvaulationFunction(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(64*12+6, 70)
        self.fc2 = nn.Linear(70, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


evf = EvaulationFunction()

criterion = nn.L1Loss()
optimizer = torch.optim.SGD(evf.parameters(), lr=.1, momentum=0.9)

inputs = torch.randn(16, 64*12+6)
labels = torch.randn(16, 1)
for epoch in range(10):
    # zero the parameter gradients
    optimizer.zero_grad()
    # forward + backward + optimize
    outputs = evf(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    print("epoch {}, loss {:.3f}".format(epoch, loss.item()))
    print("grad.abs().sum() {}".format(torch.sum(torch.tensor([p.grad.abs().sum() for p in evf.parameters()]))))

It still isn’t converging, and I believe that the model isn’t updating the parameters. (Every epoch for every 2000 positions, the average loss per position is calculated, and it doesn’t change at all between epochs)

My code snippet is able to overfit random data samples if I change the optimizer to e.g. Adam and the predictions show a small error:
image

Maybe the problem is with my dataset? My dataset has a sparse vector of floats of length 774 as an input and the labels are single floating points

I would recommend trying to overfit a small subset of your dataset (e.g. just 10 sample) first by playing around with some hyperparameters (e.g. learning rate etc.). This is how I’ve tested your code and made sure the model itself is able to overfit random (noise) data.

Not entirely clear on your objective in this instance, but I do know in chess, positional information is important. In fact, Google’s AlphaZero was designed with this in mind, using a series of convolutional layers before the linear layers.

When you flatten the chessboard, you end up decorrelating much of that positional information.

On the other hand, it looks like you also have some scalar information(i.e. +12+6) you’d like for the model to consider. You’d probably be best to add this on the subsequent linear layers after the CNN, if those do not contain any positional information.

Ok I got this to work, but it’s still not working for the chess position data. It literally barely edits the weights at all.

I’m trying to see if CNN will help, but I’m not sure how to make the dimensions work

self.cnn = nn.Conv1d(16, 64*12+6, 70)
self.fc1 = nn.Linear(70, 70)
self.fc2 = nn.Linear(70, 1)

is not working
Sorry, I’m quite new to pytorch.