I am trying to implement k-fold
validation in PyTorch
with the MNIST dataset. I have found one tutorial with colab code in here. I followed the same procedure instructed in the tutorial. But, unfortunately, I am getting a very high validation loss
than the training loss
.
Epoch:70/100 AVG Training Loss:0.156 AVG valid Loss:0.581 %
Epoch:71/100 AVG Training Loss:0.157 AVG valid Loss:0.610 %
Epoch:72/100 AVG Training Loss:0.150 AVG valid Loss:0.606 %
Epoch:73/100 AVG Training Loss:0.142 AVG valid Loss:0.585 %
Epoch:74/100 AVG Training Loss:0.155 AVG valid Loss:0.613 %
Epoch:75/100 AVG Training Loss:0.144 AVG valid Loss:0.593 %
Epoch:76/100 AVG Training Loss:0.150 AVG valid Loss:0.583 %
Epoch:77/100 AVG Training Loss:0.146 AVG valid Loss:0.564 %
Epoch:78/100 AVG Training Loss:0.151 AVG valid Loss:0.588 %
Epoch:79/100 AVG Training Loss:0.150 AVG valid Loss:0.588 %
Epoch:80/100 AVG Training Loss:0.142 AVG valid Loss:0.578 %
Epoch:81/100 AVG Training Loss:0.145 AVG valid Loss:0.550 %
Epoch:82/100 AVG Training Loss:0.146 AVG valid Loss:0.587 %
Epoch:83/100 AVG Training Loss:0.143 AVG valid Loss:0.584 %
Epoch:84/100 AVG Training Loss:0.137 AVG valid Loss:0.573 %
Epoch:85/100 AVG Training Loss:0.137 AVG valid Loss:0.587 %
Epoch:86/100 AVG Training Loss:0.146 AVG valid Loss:0.562 %
Epoch:87/100 AVG Training Loss:0.143 AVG valid Loss:0.578 %
Epoch:88/100 AVG Training Loss:0.147 AVG valid Loss:0.579 %
Epoch:89/100 AVG Training Loss:0.138 AVG valid Loss:0.538 %
Epoch:90/100 AVG Training Loss:0.142 AVG valid Loss:0.571 %
Epoch:91/100 AVG Training Loss:0.139 AVG valid Loss:0.566 %
Epoch:92/100 AVG Training Loss:0.136 AVG valid Loss:0.579 %
Epoch:93/100 AVG Training Loss:0.143 AVG valid Loss:0.531 %
Epoch:94/100 AVG Training Loss:0.133 AVG valid Loss:0.526 %
Epoch:95/100 AVG Training Loss:0.143 AVG valid Loss:0.564 %
Epoch:96/100 AVG Training Loss:0.138 AVG valid Loss:0.535 %
Epoch:97/100 AVG Training Loss:0.138 AVG valid Loss:0.543 %
Epoch:98/100 AVG Training Loss:0.137 AVG valid Loss:0.534 %
Epoch:99/100 AVG Training Loss:0.139 AVG valid Loss:0.538 %
Epoch:100/100 AVG Training Loss:0.135 AVG valid Loss:0.534 %
I have searched online including the PyTorch forum about this problem. After searching, I have found that it could happen because of overfitting
or lack of dataset
or maybe for the model structure
.
As I am using a very known dataset MNIST digits, the model is very simple, and we have a good number of datasets. So, getting a higher validation error than the training loss seems something wrong to me. I think maybe I am doing something wrong or maybe my K-fold code contains a logical error.
Data loading code
def data_loaders():
train_data = datasets.MNIST(
root = 'data',
train = True,
transform = transforms.ToTensor(),
download = True,
)
test_data = datasets.MNIST(
root = 'data',
train = False,
transform = transforms.ToTensor()
)
return train_data, test_data
Model training and validation loop
def train_epoch(model, train_dataloaders, optimizer, criterion):
train_loss = 0.0
model.train()
for images, labels in train_dataloaders:
b_x = images
b_y = labels
optimizer.zero_grad()
output = model(b_x)[0]
loss = criterion(output.squeeze(-1), b_y.float())
loss.backward()
optimizer.step()
train_loss +=loss.item() * images.size(0)
return train_loss
def valid_epoch(model, valid_dataloaders, criterion):
valid_loss = 0.0
model.eval()
for images, labels in valid_dataloaders:
b_x = images
b_y = labels
output = model(b_x)[0]
loss = criterion(output.squeeze(-1), b_y.float())
valid_loss +=loss.item() * images.size(0)
return valid_loss
def model_train():
train_data, test_data = data_preprocess.data_loaders()
splits=KFold(n_splits=K_Fold,shuffle=True,random_state=42)
foldperf={}
for fold, (train_idx,val_idx) in enumerate(splits.split(np.arange(len(train_data)))):
print('Fold {}'.format(fold + 1))
train_sampler = SubsetRandomSampler(train_idx)
valid_sampler = SubsetRandomSampler(val_idx)
train_loader = DataLoader(train_data, batch_size=512, sampler=train_sampler)
valid_loader = DataLoader(train_data, batch_size=512, sampler=valid_sampler)
model = my_model.get_model()
optimizer = optim.SGD(params=model.parameters(), lr=0.02)
criterion = nn.MSELoss()
history = {'train_loss': [], 'valid_loss': []}
for epoch in range(100):
train_loss=train_epoch(model, train_loader, optimizer, criterion)
valid_loss=valid_epoch(model,valid_loader, criterion)
train_loss = train_loss / len(train_loader.sampler)
valid_loss = valid_loss / len(valid_loader.sampler)
print("Epoch:{}/{} AVG Training Loss:{:.3f} AVG valid Loss:{:.3f} %".format(epoch + 1, NB_EPOCS, train_loss, valid_loss))
history['train_loss'].append(train_loss)
history['valid_loss'].append(valid_loss)
foldperf['fold{}'.format(fold+1)] = history
# Save Model
model_checkpoint_dir = os.path.join(address, "model.h5")
torch.save(model.state_dict(), model_checkpoint_dir)
The model (structure from here) and they used CrossEntropyLoss
and Adam
optimizer. But I used, MSELoss
and optimizer to SGD
. However, with MSELoss
and SGD
model is working as expected (without k-fold).
Any idea, why I am getting validation error higher than the training error? What should I do to solve the issue?
Thank you