Why is the validation loss lower than the training loss?

Hi ya’ll!

I’m trying to train a binary classification model and I’ve observed my validation loss is way lower than the training loss. How is that possible?

Here’s the training loop.

model.train()
for e in tqdm(range(1, EPOCHS+1)):
    train_epoch_loss = 0
    
    for x_batch_train, y_batch_train in train_loader:
        x_batch_train, y_batch_train = x_batch_train.to(device), y_batch_train.to(device)
        
        optimizer.zero_grad()
    
        y_pred_probs_train = model(x_batch_train).squeeze()
        
        train_loss = criterion(y_pred_probs_train, y_batch_train)
                
        train_loss.backward()
        
        optimizer.step()
        
        train_epoch_loss += train_loss.item()
        
    
    with torch.no_grad():
        model.eval()
        for x_batch_val, y_batch_val in validation_loader:
            val_epoch_loss = 0

            x_batch_val, y_batch_val = x_batch_val.to(device), y_batch_val.to(device)

            y_pred_probs_val = model(x_batch_val).squeeze()

            val_loss = criterion(y_pred_probs_val, y_batch_val)

            val_epoch_loss += val_loss.item()


    print(f'Epoch {e+0:02}: | Train Loss: {train_epoch_loss/len(train_loader):.5f} | Val Loss: {val_epoch_loss/len(validation_loader):.5f}')

And following is the output of the loop.

########## OUTPUT ################

Epoch 01: | Train Loss: 16.43517 | Val Loss: 0.45582
Epoch 02: | Train Loss: 5.16361 | Val Loss: 0.05326
Epoch 03: | Train Loss: 0.69327 | Val Loss: 0.05327
Epoch 04: | Train Loss: 0.69327 | Val Loss: 0.05327
Epoch 05: | Train Loss: 0.69319 | Val Loss: 0.05329
Epoch 06: | Train Loss: 0.69314 | Val Loss: 0.05330
Epoch 07: | Train Loss: 0.69310 | Val Loss: 0.05334
Epoch 08: | Train Loss: 0.69302 | Val Loss: 0.05336
Epoch 09: | Train Loss: 0.69291 | Val Loss: 0.05335
Epoch 10: | Train Loss: 0.69270 | Val Loss: 0.05330
Epoch 11: | Train Loss: 0.69020 | Val Loss: 0.05365
Epoch 12: | Train Loss: 0.68584 | Val Loss: 0.05278
Epoch 13: | Train Loss: 0.68309 | Val Loss: 0.05325
Epoch 14: | Train Loss: 0.68111 | Val Loss: 0.05341
Epoch 15: | Train Loss: 0.67870 | Val Loss: 0.05416
Epoch 16: | Train Loss: 0.67404 | Val Loss: 0.05502
Epoch 17: | Train Loss: 0.67135 | Val Loss: 0.05591
Epoch 18: | Train Loss: 0.66845 | Val Loss: 0.05643
Epoch 19: | Train Loss: 0.66542 | Val Loss: 0.05629
Epoch 20: | Train Loss: 0.66551 | Val Loss: 0.05840

The class distribution in my train and validation set are pretty balanced -
Train - {'class_1': 199, 'class_0': 201}
Validation - {'class_1': 50, 'class_0': 48}

I used random_split() to create validation+train from my original dataset. ’

The test-performance is even worse -

# Classification Report

              precision    recall  f1-score   support

           0       0.50      1.00      0.67       250
           1       0.00      0.00      0.00       250

    accuracy                           0.50       500
   macro avg       0.25      0.50      0.33       500
weighted avg       0.25      0.50      0.33       500


# Confusion Matrix
[[250   0]
 [250   0]]

Here’s my model -

HotDogClassifier(
  (block1): Sequential(
    (0): Conv2d(3, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (block2): Sequential(
    (0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (block3): Sequential(
    (0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Dropout2d(p=0.1, inplace=False)
  )
  (lastcnn): Conv2d(64, 2, kernel_size=(56, 56), stride=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

What am I doing something wrong?

val_epoch_loss = 0

is inside the validation for loop , it should be before the for loop

1 Like