How calculate loss for BCEwithlogitloss per epoch?

Hi,

Small question

Which loss is correctly calculated? The train_loss or the avg_loss?

This is the code:


# Training
for epoch in range(epochs):
  model.train()
  correct = 0
  loss_total = 0
  y_true = []
  y_pred = []
  train_loss = 0.0

  for i in train_val_loader:
      
      #LOADING THE DATA IN A BATCH
      data, target = i

      # moving the tensors to the configured device
      data, target = data.to(DEVICE), target.to(DEVICE)
      
      #FORWARD PASS
      target = target.float()
      output = model(data.float())
      loss = criterion(output, target.unsqueeze(1)) 
      
      loss_total += loss.item()

      train_loss += loss.item() * data.size(0)

      
      #BACKWARD AND OPTIMIZE
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

      
      #PREDICTIONS BCELogitsloss()
      pred = torch.round(torch.sigmoid(output.detach()))
      target = target.float()
      y_true.extend(target.tolist()) 
      y_pred.extend(pred.reshape(-1).tolist())
      
  train_loss /= len(train_val_loader.dataset)
  avg_loss = loss_total / len(train_val_loader)  # Calculate average loss per epoch



  print("Accuracy on training set is" , accuracy_score(y_true,y_pred), 'train loss', train_loss, 'avg_loss', avg_loss)

Both losses would be correct if the number of samples is divisible by the batch size without a remainder. The train_loss would be correct in all cases as it uses the real number of samples. The issue with avg_loss is that it uses the number of batches to normalize the loss, which could contain a different number of samples (i.e. the last batch if drop_last=True is not used).

Assume your dataset has 11 samples and you are using a batch size of 2, which will create 5 batches containing 2 samples and the last batch containing a single sample:

output = torch.arange(11).float().view(-1, 1) 
output += torch.randn_like(output)
target = torch.arange(11).float().view(-1, 1)

dataset = TensorDataset(output, target)
loader = DataLoader(dataset, batch_size=2, drop_last=False)
criterion = nn.MSELoss()

loss_total = 0.
train_loss = 0.
for o, t in loader:
    print("output shape {}".format(o.shape))
    
    loss = criterion(o, t)
    print("current loss: {:.3f}, total loss in batch: {:.3f}".format(
        loss.item(), loss.item() * o.size(0)))
    
    loss_total += loss.item()

    train_loss += loss.item() * o.size(0)

# divide by number of samples in the dataset
train_loss /= len(loader.dataset)

# divide by number of batches
avg_loss = loss_total / len(loader)  

print("train_loss: {}".format(train_loss))
print("avg_loss: {}".format(avg_loss))

# output
output shape torch.Size([2, 1])
current loss: 1.705, total loss in batch: 3.411
output shape torch.Size([2, 1])
current loss: 0.220, total loss in batch: 0.440
output shape torch.Size([2, 1])
current loss: 3.629, total loss in batch: 7.259
output shape torch.Size([2, 1])
current loss: 0.833, total loss in batch: 1.666
output shape torch.Size([2, 1])
current loss: 1.937, total loss in batch: 3.874
output shape torch.Size([1, 1])
current loss: 1.216, total loss in batch: 1.216
train_loss: 1.624169945716858
avg_loss: 1.590177724758784

Thank you for clarification :slight_smile:

It is also very useful because I have a small dataset