Loss increases every time I run the model

Hi, the first time I trained the model, the loss started with 1.0 and gradually decreased on both train and validation. but whenever I re-run it again, it increases. It seems like it is counting the previous loss together. I am doing a binary classification. What can I do to stop adding the loss every time I run? I know we should initialize the weight of the model. How can I do this inside my architecture? Do I need to use a fixed seed? if so how to use this? this is my code snippet

#original code 
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        
        #refer the article ... 
        self.conv1 = nn.Sequential(
            nn.Conv1d(1, 32, kernel_size=2, stride=1),
            nn.Dropout(0.5),
            nn.ReLU())
         
        self.conv2 = nn.Sequential(
            nn.Conv1d(32,32, kernel_size=2, stride=1),
            nn.Dropout(0.5),
            nn.ReLU(),
            nn.MaxPool1d(2,stride=3))
          
        self.conv3 = nn.Sequential(
            nn.Conv1d(32, 32, kernel_size=2, stride=1),
            nn.Dropout(0.5),
            nn.ReLU())
        
        #fully connected layers
        self.fc1 = nn.Linear(32*47,32)
        #self.fc2 = nn.Linear(32,1)
        self.fc2 = nn.Linear(32,1)
        #self.activation = nn.Softmax()
        #self.activation = nn.Sigmoid()
        
         # Initialization
        #nn.init.normal_(self.fc1.weight)
        #nn.init.normal_(self.fc2.weight)
    
    
    def forward(self, x):
        # input x : 
        #expected conv1d input = minibatch_size * num_channel * width
        batch_size=x.size(0)
        y = self.conv1(x.view(batch_size,1,-1))
        y = self.conv2(y)
        y = self.conv3(y)
        
        #print(y.size())
        batch_size= y.size(0)
        y = y.flatten(start_dim=1)
        #print(y.size())
        y = self.fc1(y.view(y.size(0), -1))
        #y = self.fc1(y.view(batch_size,1,-1))
        y = self.fc2(y.view(batch_size,1,-1))

        return y

This is what I get when I run the model.

total confusion matrix
Accuracy tensor(0.8333)
Sensitivity tensor([0.9778, 0.7279])
PPV tensor([0.7239, 0.9782])
Epoch 1 		 Training Loss: 4.26098953670633 		 Validation Loss: 9.583865708112716
Validation Loss Decreased(inf--->287.515971) 	 Saving The Model
total confusion matrix
Accuracy tensor(0.7490)
Sensitivity tensor([0.9975, 0.5691])
PPV tensor([0.6262, 0.9969])
Epoch 2 		 Training Loss: 4.444644435301871 		 Validation Loss: 16.55208384990692
total confusion matrix
Accuracy tensor(0.7406)
Sensitivity tensor([1.0000, 0.5569])
PPV tensor([0.6151, 1.0000])
Epoch 3 		 Training Loss: 3.656472649733375 		 Validation Loss: 17.436482475201288
total confusion matrix
Accuracy tensor(0.9479)
Sensitivity tensor([0.9879, 0.9177])
PPV tensor([0.9007, 0.9901])
Epoch 4 		 Training Loss: 3.4676516719190476 		 Validation Loss: 5.442984523872535
Validation Loss Decreased(287.515971--->163.289536) 	 Saving The Model

Hi Fathima, could you also share your training loop so we can assist?

Off the bat, one thing that seems possibly suspicious is that your evaluation metrics (accuracy, sensitivity, and so on) are tensors. If you forgot to .detach() them from the graph, this could potentially have an impact on your training. However, perhaps that’s not the issue here.

Hi Andrei,

Here is the training loop

min_valid_loss = np.inf # track change in validation loss

Matrics = ConfusionMetrics()

#store loss train and valid
train_losses, valid_losses = [], []
epoch_train_loss,epoch_valid_loss = [],[]
epoch_train_losses,epoch_valid_losses=[],[]

for e in range(epochs):
    
    train_loss = 0.0
    valid_loss = 0.0
    
    model.train()     # Optional when not using Model Specific layer
    for data, target in train_loader:
         # Transfer Data to GPU if available
        if torch.cuda.is_available():
            data, target = data.cuda(), target.cuda()
            #print("targets.data", targets.item())
        
        
         # Clear the gradients
        optimizer.zero_grad()
        
        # Forward Pass
        output = (model(data))
        squeezed_output = torch.squeeze(output)
        
        # Find the Loss
        loss = criterion(squeezed_output.view(-1,1),target.view(-1,1))
        
        # Calculate gradients 
        loss.backward()
        
        # Update Weights
        optimizer.step()
        
        # Calculate Loss
        train_loss += loss.item() * data.size(0)

        # add the loss of the batch to the train_loss
        train_losses.append(train_loss)
        
    #compute the mean loss of the dataset    
    epoch_train_loss =torch.tensor(train_losses).mean()
    epoch_train_losses.append(epoch_train_loss)
        
        
 #Evalution 
   
    model.eval()     # Optional when not using Model Specific layer
    for data, target in valid_loader:
        
        if torch.cuda.is_available():
            data, target = data.cuda(), target.cuda()
        
        output = model(data)
        squeezed_output = torch.squeeze(output)
        #print("=====LOSS===========",target)
        
        loss = criterion(squeezed_output.view(-1,1),target.view(-1,1))
        valid_loss += loss.item() * data.size(0)
        
        #valid_loss = valid_loss/len(valid_loader.dataset)
        valid_losses.append(valid_loss)
        
        Matrics.accumulate(target,output)
        
    epoch_valid_loss=torch.tensor(valid_losses).mean()
    epoch_valid_losses.append(epoch_valid_loss)
                
                
    print('total confusion matrix')
    print('Accuracy',Matrics.accuracy())
    print('Sensitivity', Matrics.se())
    print('PPV', Matrics.ppv())
    
       
    print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(train_loader)} \t\t Validation Loss: {valid_loss / len(valid_loader)}')

   
#confustion matrix and heatmap
    cm = Matrics.matrix()
    plt.figure()
    sns.heatmap(cm, annot=True, fmt='g')
    plt.xlabel('Predicted Value')
    plt.ylabel('True Value')
    reset = Matrics.reset()
   

    if min_valid_loss > valid_loss:
        print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving The Model')
        min_valid_loss = valid_loss
        
         # Saving State Dict
        torch.save(model.state_dict(), 'saved_model.pth')
    

And this is the evaluation metrics function

def accuracy(self):
        return self.__matrix.diagonal().sum()/self.__matrix.sum()
    
    def se(self):
        return self.normalized_matrix('truth').diagonal()
    
    def ppv(self):
        return self.normalized_matrix('pred').diagonal()
    

I am not sure where the problem is exactly. Also, I am using BCElogit for loss as this is a binary classification task

OK, and, just to be clear, what exactly is the issue? I see the training loss is generally decreasing and the validation loss is at its lowest after epoch 4 (though it jumps around a little before). It seems like your performance is improving as you train longer.

The issue is every time I run this model the loss begins with higher numbers. The loss decreased by 287 which means it is adding the previous loss every time I run . I don’t want it

Validation Loss Decreased(inf—>287.515971) Saving The Model

One thing that seems incorrect is that valid_losses is a list of cumulative sums of validation losses per validation batch.

valid_loss is already the cumulative sum (across validation batches) of validation losses, and therefore valid_losses is a list of cumulative sums. So when you later take the average of valid_losses, this is not a meaningful quantity.

If we call the losses for each validation batch L_0, L_1, L_2 and so on, we have:

valid_losses = [
    L_0, 
    L_0 + L_1,
    L_0 + L_1 + L_2,
    ...
    L_0 + L_1 + L_2 + ... + L_n
]

Evidently the mean of valid_losses is not what you want to compute, so the quantity you are putting into epoch_valid_loss and epoch_valid_losses is bogus.

However, in terms of what’s actually getting printed, I think you’re just confusing two different things. You said the loss decreased by 287 which is not what the program is printing. The 287.51 is your total per-datapoint validation loss. The 9.58 is the average per-batch validation loss (this suggests you’re using a batch size of 32, btw). None of this feels like a problem, these are just different numbers and they don’t suggest that the loss is somehow accumulating.

The inf is just there because you initiate the ‘min validation loss’ at infinity before you update it on the first epoch. That doesn’t mean anything it’s adding the previous loss or anything like that.

Thank you, I am still unclear how I could fix this to have a more meaningful result?

OK, to make the printout more clear and to recod the right thing in epoch_valid_loss you could do this:

  1. get rid of valid_losses altogether
  2. change to: epoch_valid_loss=torch.tensor(valid_loss)
  3. in the printout, divide by the length of the validation loader as you do before:
        print(f'Validation Loss Decreased({min_valid_loss / len(valid_loader):.6f}--->{valid_loss / len(valid_loader):.6f}) \t Saving The Model')

I will try this and update. Also is it necessary to initialize the weights in model architecture? Or pytorch does it by default?

When you instantiate a model, it does it by default, no need to do anything explicitly. When you load a saved model, obviously the saved weights get loaded.

There may be some way to do custom / fancy weight initiations, but I think in the vast majority of use cases that’s not required.

Hi Andrei,

just to answer your previous question and evaluation matrics, I used torch.no_grad() in the validation set. I also removed the valid_lossess/epoch_valid_losses altogether. Is that correct or is it only the valid_lossess that needs to be eliminated?

In the snippet you shared, epoch_valid_losses wasn’t used anywhere so it’s fine to remove.

Hi Andrei,

In the code, I have used epoch_valid_losses.
So I trained the model and the loss gradually decreased on training but the validation decreased and increased. My dataset is divided with a ratio of 80:10:10 .
Also the graph appears to be a little weird, if you look at the loss axis, the values start from 150. I believe this should start with 0.0. What is causing this and how can this be resolved?

image

Loss should not start from 0.0 because a loss of 0.0 would mean that the model is absolutely perfect and makes no errors whatsoever, which is not realistic. Losses start high and decrease. You should train the model for longer, 4 epochs is quite short, something like 100-300 epochs is more typical (depending on your exact problem and training setup).

Is it okay if the loss starts at 7.0 and reduces at each epochs. I am trying to understand how the whole NN works from architecture to running it.

I also have a question about how to how to check the model is providing an output of probability ?

One more question regarding the loss range. Looking at the below picture you can see the training loss starts from 15 and the Validation loss starts from 0.0059. Why there is this different in scaling ?

To interpret the outputs as probabilities, you can run them through a softmax layer:

from torch.nn.functional import softmax
probabilities = softmax(output, dim=1)

However, to be precise, please be warned that although these may satisfy some of the criteria of probabilities (for example they sum to 1 across the different classes, just like mutually exclusive probabilities should, and the closer to 1 it is the more confident your model is) they aren’t actually probability estimates. It’s not easy to convert raw logits into probabilities. But for most applications, the softmax output might be good enough.

Regarding the different scaling of the losses, this is because you’re summing losses, so your results will depend on the size of your batches and potentially size of your dataset. If you want to put everything in apples-to-apples and fix the scaling problem, I recommend doing something like this:

        valid_loss = loss.item()  # average per-datapoint loss within this minibatch
        valid_losses.append(valid_loss)  # a list of all per-datapoint losses across different minibatches

        # then, at the end of the epoch
        epoch_valid_loss = torch.tensor(valid_losses).mean()  # the average of the per-datapoint losses

Then you just plot epoch_valid_loss, and make the same changes for epoch_train_loss. Then you should have apples to apples.

By the way, in addition to measuring loss, you want to use accuracy and recall as metrics of how well your model is doing. You can look up what accuracy and recall are, and then you can compute them by converting your model output into a specific prediction by taking the .argmax(dim=1) and then you keep track of how often those model predictions are accurate (plus false positives, false negatives and so on).