Validation loss if higher than training loss (for K-fold validation)

akib62 · January 25, 2022, 1:57am

I am trying to implement k-fold validation in PyTorch with the MNIST dataset. I have found one tutorial with colab code in here. I followed the same procedure instructed in the tutorial. But, unfortunately, I am getting a very high validation loss than the training loss.

Epoch:70/100 AVG Training Loss:0.156 AVG valid Loss:0.581 %
Epoch:71/100 AVG Training Loss:0.157 AVG valid Loss:0.610 %
Epoch:72/100 AVG Training Loss:0.150 AVG valid Loss:0.606 %
Epoch:73/100 AVG Training Loss:0.142 AVG valid Loss:0.585 %
Epoch:74/100 AVG Training Loss:0.155 AVG valid Loss:0.613 %
Epoch:75/100 AVG Training Loss:0.144 AVG valid Loss:0.593 %
Epoch:76/100 AVG Training Loss:0.150 AVG valid Loss:0.583 %
Epoch:77/100 AVG Training Loss:0.146 AVG valid Loss:0.564 %
Epoch:78/100 AVG Training Loss:0.151 AVG valid Loss:0.588 %
Epoch:79/100 AVG Training Loss:0.150 AVG valid Loss:0.588 %
Epoch:80/100 AVG Training Loss:0.142 AVG valid Loss:0.578 %
Epoch:81/100 AVG Training Loss:0.145 AVG valid Loss:0.550 %
Epoch:82/100 AVG Training Loss:0.146 AVG valid Loss:0.587 %
Epoch:83/100 AVG Training Loss:0.143 AVG valid Loss:0.584 %
Epoch:84/100 AVG Training Loss:0.137 AVG valid Loss:0.573 %
Epoch:85/100 AVG Training Loss:0.137 AVG valid Loss:0.587 %
Epoch:86/100 AVG Training Loss:0.146 AVG valid Loss:0.562 %
Epoch:87/100 AVG Training Loss:0.143 AVG valid Loss:0.578 %
Epoch:88/100 AVG Training Loss:0.147 AVG valid Loss:0.579 %
Epoch:89/100 AVG Training Loss:0.138 AVG valid Loss:0.538 %
Epoch:90/100 AVG Training Loss:0.142 AVG valid Loss:0.571 %
Epoch:91/100 AVG Training Loss:0.139 AVG valid Loss:0.566 %
Epoch:92/100 AVG Training Loss:0.136 AVG valid Loss:0.579 %
Epoch:93/100 AVG Training Loss:0.143 AVG valid Loss:0.531 %
Epoch:94/100 AVG Training Loss:0.133 AVG valid Loss:0.526 %
Epoch:95/100 AVG Training Loss:0.143 AVG valid Loss:0.564 %
Epoch:96/100 AVG Training Loss:0.138 AVG valid Loss:0.535 %
Epoch:97/100 AVG Training Loss:0.138 AVG valid Loss:0.543 %
Epoch:98/100 AVG Training Loss:0.137 AVG valid Loss:0.534 %
Epoch:99/100 AVG Training Loss:0.139 AVG valid Loss:0.538 %
Epoch:100/100 AVG Training Loss:0.135 AVG valid Loss:0.534 %

I have searched online including the PyTorch forum about this problem. After searching, I have found that it could happen because of overfitting or lack of dataset or maybe for the model structure.

As I am using a very known dataset MNIST digits, the model is very simple, and we have a good number of datasets. So, getting a higher validation error than the training loss seems something wrong to me. I think maybe I am doing something wrong or maybe my K-fold code contains a logical error.

Data loading code

def data_loaders():
    train_data = datasets.MNIST(
        root = 'data',
        train = True,                         
        transform = transforms.ToTensor(), 
        download = True,            
    )
    test_data = datasets.MNIST(
        root = 'data', 
        train = False, 
        transform = transforms.ToTensor()
    )

    return train_data, test_data

Model training and validation loop

def train_epoch(model, train_dataloaders, optimizer, criterion):
    train_loss = 0.0
    model.train()
    for images, labels in train_dataloaders:
        b_x = images   
        b_y = labels  
        optimizer.zero_grad() 
        output = model(b_x)[0]          
        loss = criterion(output.squeeze(-1), b_y.float())    
        loss.backward() 
        optimizer.step()           
        train_loss +=loss.item() * images.size(0)

        return train_loss

def valid_epoch(model, valid_dataloaders, criterion):
    valid_loss = 0.0
    model.eval()
    for images, labels in valid_dataloaders:
        b_x = images  
        b_y = labels   
        output = model(b_x)[0]          
        loss = criterion(output.squeeze(-1), b_y.float())                
        valid_loss +=loss.item() * images.size(0)
        return valid_loss
        
def model_train():
    train_data, test_data = data_preprocess.data_loaders() 
    splits=KFold(n_splits=K_Fold,shuffle=True,random_state=42)
    foldperf={}
    for fold, (train_idx,val_idx) in enumerate(splits.split(np.arange(len(train_data)))):
        print('Fold {}'.format(fold + 1))

        train_sampler = SubsetRandomSampler(train_idx)
        valid_sampler = SubsetRandomSampler(val_idx)
        train_loader = DataLoader(train_data, batch_size=512, sampler=train_sampler)
        valid_loader = DataLoader(train_data, batch_size=512, sampler=valid_sampler)

        model = my_model.get_model() 
        optimizer = optim.SGD(params=model.parameters(), lr=0.02)
        criterion = nn.MSELoss()

        history = {'train_loss': [], 'valid_loss': []}

        for epoch in range(100):
            train_loss=train_epoch(model, train_loader, optimizer, criterion)
            valid_loss=valid_epoch(model,valid_loader, criterion)

            train_loss = train_loss / len(train_loader.sampler)
            valid_loss = valid_loss / len(valid_loader.sampler)

            print("Epoch:{}/{} AVG Training Loss:{:.3f} AVG valid Loss:{:.3f} %".format(epoch + 1, NB_EPOCS, train_loss, valid_loss))
            history['train_loss'].append(train_loss)
            history['valid_loss'].append(valid_loss)
        foldperf['fold{}'.format(fold+1)] = history 

    # Save Model 
    model_checkpoint_dir = os.path.join(address, "model.h5")
    torch.save(model.state_dict(), model_checkpoint_dir)

The model (structure from here) and they used CrossEntropyLoss and Adam optimizer. But I used, MSELoss and optimizer to SGD. However, with MSELoss and SGD model is working as expected (without k-fold).

Any idea, why I am getting validation error higher than the training error? What should I do to solve the issue?

Thank you

mMagmer · January 25, 2022, 7:50am

i think there is a code indentation typo.

akib62:

def train_epoch(model, train_dataloaders, optimizer, criterion):
    train_loss = 0.0
    model.train()
    for images, labels in train_dataloaders:
        b_x = images   
        b_y = labels  
        optimizer.zero_grad() 
        output = model(b_x)[0]          
        loss = criterion(output.squeeze(-1), b_y.float())    
        loss.backward() 
        optimizer.step()           
        train_loss +=loss.item() * images.size(0)

        return train_loss

def valid_epoch(model, valid_dataloaders, criterion):
    valid_loss = 0.0
    model.eval()
    for images, labels in valid_dataloaders:
        b_x = images  
        b_y = labels   
        output = model(b_x)[0]          
        loss = criterion(output.squeeze(-1), b_y.float())                
        valid_loss +=loss.item() * images.size(0)
        return valid_loss

return should be out of for loop.

akib62 · January 25, 2022, 12:28pm

@mMagmer thanks a lot for your reply and for catching the typo. I will change the code and let you know the updated loss.

However, is there any effect of the indentation typo with saving the model?

torch.save(model.state_dict(), model_checkpoint_dir)

Because the model (the previous one with indentation typo) performance with the test dataset was terrible.

mMagmer · January 25, 2022, 12:49pm

in current version with each epoch you’re only forwarding one batch becuase of return in for loop.

akib62 · January 25, 2022, 1:10pm

@mMagmer thanks it works, though I am getting a bit of higher loss (than without the k-fold validation model)

Epoch:45/50 AVG Training Loss:1.538 AVG valid Loss:1.521 %
Epoch:46/50 AVG Training Loss:1.493 AVG valid Loss:1.605 %
Epoch:47/50 AVG Training Loss:1.473 AVG valid Loss:1.450 %
Epoch:48/50 AVG Training Loss:1.512 AVG valid Loss:1.462 %
Epoch:49/50 AVG Training Loss:1.429 AVG valid Loss:1.656 %
Epoch:50/50 AVG Training Loss:1.421 AVG valid Loss:1.400 %

Out of curiosity, I would like to know, say I saved a model, and then if I test the model with the same test dataset multiple times, then the prediction (prediction/model output) will be the same for the same test dataset?

For example, the test dataset image and the target is 7, and the prediction from the model is 6 (first time). At the second time, if I give the same image and target (the previous one) then the output will be 6 (as I am using the same model)?

mMagmer · January 25, 2022, 1:45pm

if you’re using model.eval() you should get the same result