Pandas to Tensor fed to nn, model parameters not updating

Francisco_Obando · May 10, 2021, 1:27am

Hello, I’m starting to use PyTorch and want to implement a simple working example of a binary classification. I have some data that is pre-processed with pandas, then transform the training and testing sets to tensors. I create a class with 2 hidden layers with ReLU activation functions, and then a single output (logit). I define the optimizer to be Adam with the model parameters, and the loss function BCEWithLogitLoss. When I create the instance the parameters are uniformly distributed, but when I do the training process the loss is not decreased, and all model parameters remain the same. There is no missing data on the tensors, and I made sure that with the untrained instance of the network the architecture would give a single output as expected.

I suspect the error comes from the loss.backward() or optimizer.step() steps, here is a sample of the data, and the code

# Define tensors to train the model
x_train_t = torch.tensor(x_train.values, dtype=torch.float)
x_test_t = torch.tensor(x_test.values, dtype=torch.float)
y_train_t = torch.tensor(y_train.values, dtype=torch.float)
y_test_t = torch.tensor(y_test.values, dtype=torch.float)


class NetClassifier(nn.Module):
    def __init__(self,hidden1,hidden2):
        super(NetClassifier,self).__init__()
        self.model = nn.Sequential(
            nn.Linear(13,hidden1),
            nn.ReLU(),
            nn.Linear(hidden1,hidden2),
            nn.ReLU(),
            nn.Linear(hidden2,1)
        )
    def forward(self,x):
        x = self.model(x)
        return x

# Create an instance of the class
net = NetClassifier(50,50)
print(net)


# Create DataSet and DataLoaders to train the neural network
EPOCHS = 5
BATCH_SIZE=32
LEARNING_RATE = 0.01

train_data_set = TensorDataset(x_train_t,y_train_t) 
test_data_set = TensorDataset(x_test_t,y_test_t)

train_dataloader = DataLoader(train_data_set,batch_size = BATCH_SIZE)
test_dataloader = DataLoader(test_data_set,batch_size = BATCH_SIZE)


# Define optimizer and loss function
optimizer = optim.Adam(net.parameters(),lr=LEARNING_RATE)
loss_function = nn.BCEWithLogitsLoss()


for epoch in range(EPOCHS):
    epoch_loss=0
    epoch_acc=0
    for xb,yb in train_dataloader:
        # Zero gradients for training
        optimizer.zero_grad()
        # Use current model parameters to predict output
        y_pred = net(xb)
        # Turn probabilities into prediction 
        pred = torch.round(torch.sigmoid(torch.flatten(y_pred)))
        # Calculate loss, use float type to calculate loss
        loss = loss_function(yb,pred)
        # Backpropagate
        loss.backward()
        # Step in the optimizer
        optimizer.step()
        epoch_loss+=loss.item()
        epoch_acc+= (yb == pred).float().mean()
# Print epoch loss
    print("Epoch {:>02d} | Loss {:.5f} | Acc {:.3f}".format(epoch,epoch_loss/len(train_dataloader),epoch_acc/len(train_dataloader)))

AlphaBetaGamma96 · May 11, 2021, 9:06am

Hi!

I think the issue is with this line,

I think the backward method for torch.round is 0 with respect to all inputs. So, for example,

x=torch.randn(10, requires_grad=True) #returns tensor([-0.0534,  1.1239, ...])
y=torch.round(x)                      #round these values
y.backward(torch.ones_like(y))        #calculate gradient of round(x) with respect to x
x.grad                                #returns tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

And, given that this line happens after your network calculation, all the gradients of your weights/biases of your network are dependent on this line and therefore will have zero gradient! Perhaps pass in torch.sigmoid(torch.flatten(y_pred)) rather than the rounded value, and see if that works!

Francisco_Obando · May 11, 2021, 2:30pm

Thank you for the feedback, there was a problem with using BCEWithLogitsLoss with only one output on the neural network and using the loss fuction with a sigmoid (since BCEwithLogitsLoss has the sigmoid function it was actually performing a sigmoid of a sigmoid to calculate the negative log likelihood). Thankfully the community has plenty of examples so I was able to get to the bottom of the issue.

AlphaBetaGamma96 · May 11, 2021, 2:32pm

I wasn’t aware of the issue with BCEWithLogitsLoss but torch.round definitely causes your gradients to equal 0 as well. So, make sure to remove torch.round as well to ensure gradient flow to all parameters!