My Pytorch model is giving very bad results

I am new with Deep Learning with Pytorch. I am more experienced with Tensorflow, and thus I should say I am not new to Deep Learning itself.

Currently, I am working on a simple ANN classification. There are only 2 classes so quite naturally I am using a Softmax BCELoss combination.

The dataset is like this:

shape of X_train (891, 7)
Shape of Y_train (891,)
Shape of x_test (418, 7)

I transformed the X_train and others to torch tensors as train_data and so on. The next step is:

train_ds = TensorDataset(train_data, train_label)
# Define data loader
batch_size = 32
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

I made the model class like:

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
   
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(7, 32)
        self.bc1 = nn.BatchNorm1d(32)
        self.fc2 = nn.Linear(32, 64)
        self.bc2 = nn.BatchNorm1d(64)
        self.fc3 = nn.Linear(64, 128)
        self.bc3 = nn.BatchNorm1d(128)
        self.fc4 = nn.Linear(128, 32)
        self.bc4 = nn.BatchNorm1d(32)
        self.fc5 = nn.Linear(32, 10)
        self.bc5 = nn.BatchNorm1d(10)
        self.fc6 = nn.Linear(10, 1)
        self.bc6 = nn.BatchNorm1d(1)
        
        self.drop = nn.Dropout2d(p=0.5)
        
        
    def forward(self, x):
        torch.nn.init.xavier_uniform(self.fc1.weight)
        x = self.fc1(x)
        x = self.bc1(x)
        x = F.relu(x)
        
        x = self.drop(x)
        x = self.fc2(x)
        x = self.bc2(x)
        x = F.relu(x)
        
        #x = self.drop(x)
        x = self.fc3(x)
        x = self.bc3(x)
        x = F.relu(x)
        
        x = self.drop(x)
        x = self.fc4(x)
        x = self.bc4(x)
        x = F.relu(x)
        
        #x = self.drop(x)
        x = self.fc5(x)
        x = self.bc5(x)
        x = F.relu(x)
        
        x = self.drop(x)
        x = self.fc6(x)
        x = self.bc6(x)        
        x = torch.sigmoid(x)
        return x
    
model = Net()

The loss function and the optimizer are defined:

loss = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.00001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)

At last, the task is to run the forward in epochs:

num_epochs = 1000
# Repeat for given number of epochs
for epoch in range(num_epochs):
        
    # Train with batches of data
    for xb,yb in train_dl:
        pred = model(xb)
        
        yb = torch.unsqueeze(yb, 1)
        
        #print(pred, yb)
        print('grad', model.fc1.weight.grad)
        
        l = loss(pred, yb)
        #print('loss',l)
                    
        # 3. Compute gradients
        l.backward()
            
        # 4. Update parameters using gradients
        optimizer.step()
            
        # 5. Reset the gradients to zero
    optimizer.zero_grad()
    
    # Print the progress
    if (epoch+1) % 10 == 0:
        print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, l.item()))

I can see in the output that after each iteration with all the batches, the hard weights are non-zero, after this zero_grad is applied.

However, the model is pretty bad. I get an F1 score of around 50% only! And the model is bad when I call it to predict the train_dl itself!!!

I am wondering what the reason is. The grad of weights not zero but not updating properly? The optimizer not optimizing the weights? Or what else?

Can someone please have a look?

I already tried different loss functions and optimizers. I tried with smaller datasets, bigger batches, different hyperparameters.

Thanks! :slight_smile:

One thing I see at first is your optimizer.zero_grad() should be happening every step not every epoch. That could be throwing off your gradients. So just add another indent so it is in the dataloader loop not the epoch loop. Also this isn’t a big problem but it is better to use BCEWithLogitsLoss and no sigmoid for training. You can read more about why in the docs here.

Do you mean somethig like this:

num_epochs = 1000
# Repeat for given number of epochs
for epoch in range(num_epochs):
        
    # Train with batches of data
    for xb,yb in train_dl:
        pred = model(xb)
        
        yb = torch.unsqueeze(yb, 1)
        
        optimizer.zero_grad()
        
        #print(pred, yb)
        #print('grad', model.fc1.weight.grad)
        
        l = loss(pred, yb)
        #print('loss',l)
                    
        # 3. Compute gradients
        l.backward()
            
        # 4. Update parameters using gradients
        optimizer.step()
            
        # 5. Reset the gradients to zero
    #optimizer.zero_grad()
    
    # Print the progress
    if (epoch+1) % 10 == 0:
        print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, l.item()))

?

Yes. That is what I mean if you only do it every epoch the gradients will get messed up.