Which loss is correctly calculated? The train_loss or the avg_loss?
This is the code:
# Training
for epoch in range(epochs):
model.train()
correct = 0
loss_total = 0
y_true = []
y_pred = []
train_loss = 0.0
for i in train_val_loader:
#LOADING THE DATA IN A BATCH
data, target = i
# moving the tensors to the configured device
data, target = data.to(DEVICE), target.to(DEVICE)
#FORWARD PASS
target = target.float()
output = model(data.float())
loss = criterion(output, target.unsqueeze(1))
loss_total += loss.item()
train_loss += loss.item() * data.size(0)
#BACKWARD AND OPTIMIZE
optimizer.zero_grad()
loss.backward()
optimizer.step()
#PREDICTIONS BCELogitsloss()
pred = torch.round(torch.sigmoid(output.detach()))
target = target.float()
y_true.extend(target.tolist())
y_pred.extend(pred.reshape(-1).tolist())
train_loss /= len(train_val_loader.dataset)
avg_loss = loss_total / len(train_val_loader) # Calculate average loss per epoch
print("Accuracy on training set is" , accuracy_score(y_true,y_pred), 'train loss', train_loss, 'avg_loss', avg_loss)
Both losses would be correct if the number of samples is divisible by the batch size without a remainder. The train_loss would be correct in all cases as it uses the real number of samples. The issue with avg_loss is that it uses the number of batches to normalize the loss, which could contain a different number of samples (i.e. the last batch if drop_last=True is not used).
Assume your dataset has 11 samples and you are using a batch size of 2, which will create 5 batches containing 2 samples and the last batch containing a single sample:
output = torch.arange(11).float().view(-1, 1)
output += torch.randn_like(output)
target = torch.arange(11).float().view(-1, 1)
dataset = TensorDataset(output, target)
loader = DataLoader(dataset, batch_size=2, drop_last=False)
criterion = nn.MSELoss()
loss_total = 0.
train_loss = 0.
for o, t in loader:
print("output shape {}".format(o.shape))
loss = criterion(o, t)
print("current loss: {:.3f}, total loss in batch: {:.3f}".format(
loss.item(), loss.item() * o.size(0)))
loss_total += loss.item()
train_loss += loss.item() * o.size(0)
# divide by number of samples in the dataset
train_loss /= len(loader.dataset)
# divide by number of batches
avg_loss = loss_total / len(loader)
print("train_loss: {}".format(train_loss))
print("avg_loss: {}".format(avg_loss))
# output
output shape torch.Size([2, 1])
current loss: 1.705, total loss in batch: 3.411
output shape torch.Size([2, 1])
current loss: 0.220, total loss in batch: 0.440
output shape torch.Size([2, 1])
current loss: 3.629, total loss in batch: 7.259
output shape torch.Size([2, 1])
current loss: 0.833, total loss in batch: 1.666
output shape torch.Size([2, 1])
current loss: 1.937, total loss in batch: 3.874
output shape torch.Size([1, 1])
current loss: 1.216, total loss in batch: 1.216
train_loss: 1.624169945716858
avg_loss: 1.590177724758784