Validation loss if higher than training loss (for K-fold validation)

I am trying to implement k-fold validation in PyTorch with the MNIST dataset. I have found one tutorial with colab code in here. I followed the same procedure instructed in the tutorial. But, unfortunately, I am getting a very high validation loss than the training loss.

Epoch:70/100 AVG Training Loss:0.156 AVG valid Loss:0.581 %
Epoch:71/100 AVG Training Loss:0.157 AVG valid Loss:0.610 %
Epoch:72/100 AVG Training Loss:0.150 AVG valid Loss:0.606 %
Epoch:73/100 AVG Training Loss:0.142 AVG valid Loss:0.585 %
Epoch:74/100 AVG Training Loss:0.155 AVG valid Loss:0.613 %
Epoch:75/100 AVG Training Loss:0.144 AVG valid Loss:0.593 %
Epoch:76/100 AVG Training Loss:0.150 AVG valid Loss:0.583 %
Epoch:77/100 AVG Training Loss:0.146 AVG valid Loss:0.564 %
Epoch:78/100 AVG Training Loss:0.151 AVG valid Loss:0.588 %
Epoch:79/100 AVG Training Loss:0.150 AVG valid Loss:0.588 %
Epoch:80/100 AVG Training Loss:0.142 AVG valid Loss:0.578 %
Epoch:81/100 AVG Training Loss:0.145 AVG valid Loss:0.550 %
Epoch:82/100 AVG Training Loss:0.146 AVG valid Loss:0.587 %
Epoch:83/100 AVG Training Loss:0.143 AVG valid Loss:0.584 %
Epoch:84/100 AVG Training Loss:0.137 AVG valid Loss:0.573 %
Epoch:85/100 AVG Training Loss:0.137 AVG valid Loss:0.587 %
Epoch:86/100 AVG Training Loss:0.146 AVG valid Loss:0.562 %
Epoch:87/100 AVG Training Loss:0.143 AVG valid Loss:0.578 %
Epoch:88/100 AVG Training Loss:0.147 AVG valid Loss:0.579 %
Epoch:89/100 AVG Training Loss:0.138 AVG valid Loss:0.538 %
Epoch:90/100 AVG Training Loss:0.142 AVG valid Loss:0.571 %
Epoch:91/100 AVG Training Loss:0.139 AVG valid Loss:0.566 %
Epoch:92/100 AVG Training Loss:0.136 AVG valid Loss:0.579 %
Epoch:93/100 AVG Training Loss:0.143 AVG valid Loss:0.531 %
Epoch:94/100 AVG Training Loss:0.133 AVG valid Loss:0.526 %
Epoch:95/100 AVG Training Loss:0.143 AVG valid Loss:0.564 %
Epoch:96/100 AVG Training Loss:0.138 AVG valid Loss:0.535 %
Epoch:97/100 AVG Training Loss:0.138 AVG valid Loss:0.543 %
Epoch:98/100 AVG Training Loss:0.137 AVG valid Loss:0.534 %
Epoch:99/100 AVG Training Loss:0.139 AVG valid Loss:0.538 %
Epoch:100/100 AVG Training Loss:0.135 AVG valid Loss:0.534 %

I have searched online including the PyTorch forum about this problem. After searching, I have found that it could happen because of overfitting or lack of dataset or maybe for the model structure.

As I am using a very known dataset MNIST digits, the model is very simple, and we have a good number of datasets. So, getting a higher validation error than the training loss seems something wrong to me. I think maybe I am doing something wrong or maybe my K-fold code contains a logical error.

Data loading code

def data_loaders():
    train_data = datasets.MNIST(
        root = 'data',
        train = True,                         
        transform = transforms.ToTensor(), 
        download = True,            
    )
    test_data = datasets.MNIST(
        root = 'data', 
        train = False, 
        transform = transforms.ToTensor()
    )

    return train_data, test_data

Model training and validation loop

def train_epoch(model, train_dataloaders, optimizer, criterion):
    train_loss = 0.0
    model.train()
    for images, labels in train_dataloaders:
        b_x = images   
        b_y = labels  
        optimizer.zero_grad() 
        output = model(b_x)[0]          
        loss = criterion(output.squeeze(-1), b_y.float())    
        loss.backward() 
        optimizer.step()           
        train_loss +=loss.item() * images.size(0)

        return train_loss

def valid_epoch(model, valid_dataloaders, criterion):
    valid_loss = 0.0
    model.eval()
    for images, labels in valid_dataloaders:
        b_x = images  
        b_y = labels   
        output = model(b_x)[0]          
        loss = criterion(output.squeeze(-1), b_y.float())                
        valid_loss +=loss.item() * images.size(0)
        return valid_loss
        
def model_train():
    train_data, test_data = data_preprocess.data_loaders() 
    splits=KFold(n_splits=K_Fold,shuffle=True,random_state=42)
    foldperf={}
    for fold, (train_idx,val_idx) in enumerate(splits.split(np.arange(len(train_data)))):
        print('Fold {}'.format(fold + 1))

        train_sampler = SubsetRandomSampler(train_idx)
        valid_sampler = SubsetRandomSampler(val_idx)
        train_loader = DataLoader(train_data, batch_size=512, sampler=train_sampler)
        valid_loader = DataLoader(train_data, batch_size=512, sampler=valid_sampler)

        model = my_model.get_model() 
        optimizer = optim.SGD(params=model.parameters(), lr=0.02)
        criterion = nn.MSELoss()

        history = {'train_loss': [], 'valid_loss': []}

        for epoch in range(100):
            train_loss=train_epoch(model, train_loader, optimizer, criterion)
            valid_loss=valid_epoch(model,valid_loader, criterion)

            train_loss = train_loss / len(train_loader.sampler)
            valid_loss = valid_loss / len(valid_loader.sampler)

            print("Epoch:{}/{} AVG Training Loss:{:.3f} AVG valid Loss:{:.3f} %".format(epoch + 1, NB_EPOCS, train_loss, valid_loss))
            history['train_loss'].append(train_loss)
            history['valid_loss'].append(valid_loss)
        foldperf['fold{}'.format(fold+1)] = history 

    # Save Model 
    model_checkpoint_dir = os.path.join(address, "model.h5")
    torch.save(model.state_dict(), model_checkpoint_dir)

The model (structure from here) and they used CrossEntropyLoss and Adam optimizer. But I used, MSELoss and optimizer to SGD. However, with MSELoss and SGD model is working as expected (without k-fold).

Any idea, why I am getting validation error higher than the training error? What should I do to solve the issue?

Thank you

i think there is a code indentation typo.

return should be out of for loop.

@mMagmer thanks a lot for your reply and for catching the typo. I will change the code and let you know the updated loss.

However, is there any effect of the indentation typo with saving the model?

torch.save(model.state_dict(), model_checkpoint_dir)

Because the model (the previous one with indentation typo) performance with the test dataset was terrible.

in current version with each epoch you’re only forwarding one batch becuase of return in for loop.

@mMagmer thanks it works, though I am getting a bit of higher loss (than without the k-fold validation model)

Epoch:45/50 AVG Training Loss:1.538 AVG valid Loss:1.521 %
Epoch:46/50 AVG Training Loss:1.493 AVG valid Loss:1.605 %
Epoch:47/50 AVG Training Loss:1.473 AVG valid Loss:1.450 %
Epoch:48/50 AVG Training Loss:1.512 AVG valid Loss:1.462 %
Epoch:49/50 AVG Training Loss:1.429 AVG valid Loss:1.656 %
Epoch:50/50 AVG Training Loss:1.421 AVG valid Loss:1.400 %

Out of curiosity, I would like to know, say I saved a model, and then if I test the model with the same test dataset multiple times, then the prediction (prediction/model output) will be the same for the same test dataset?

For example, the test dataset image and the target is 7, and the prediction from the model is 6 (first time). At the second time, if I give the same image and target (the previous one) then the output will be 6 (as I am using the same model)?

if you’re using model.eval() you should get the same result