I am working on hyperspectral image as input of my NN.
this is the shape of my data set (65000,1,92) - 65000 is number of samples (signal) , 1 is channel and 92 is length of my signal.
When I trained my model it turned out that error of test data is less than error of training.
I assume that it has to do with how i shuffle my data. because signals are distributed randomly in my pixels. (every single pixel is a signal)
below you can see my dataloader code, has anyone have any idea if it is OkAY?
How large is the gap between the training and test error?
If your model uses e.g. dropout layers, the loss might be (slightly) higher during training, as dropout reduces the model capacity of the model.
The usage of torch.utils.data.random_split should be correct.
How are you calculating the training loss? Are you printing the average of the complete training epoch or is this the value of the last batch?
In the former case, are you observing also the training loss for each batch?
I used these:
###training loop, scheduler learning was used to find the best learning rate. in evert 250 epoch learning rate changes by the factor of 0.1
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=320, gamma=0.7, last_epoch=-1)
epoch_num = 0
train_error = []
test_error = []
best_error = 100
for epoch in range(num_epochs):
print(scheduler.get_last_lr())
loss_total = 0
test_loss_total = 0
model.train()
for batch_idx, sample in enumerate(train_loader):
inp = sample
inp = inp.cuda()
output = model(inp).cuda()
loss = criterion(output, inp)
loss_total += loss
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()
loss_total = loss_total / (batch_idx+1)
train_error.append(loss_total.item())
model.eval()
with torch.no_grad():
for batch_idx, sample in enumerate(test_loader):
inp = sample
inp = inp.cuda()
output = model(inp).cuda()
loss = criterion(output, inp)
test_loss_total += loss
test_loss_total = test_loss_total / (batch_idx+1)
if loss < best_error:
best_error = loss
best_epoch = epoch
print(‘Best loss at epoch’, best_epoch)
model_save_name = ‘2020-05-25 Li_sample_no_image_processing (5)’
path = F"/content/drive/My Drive/{model_save_name}"
torch.save(model.state_dict(), path)
test_error.append(test_loss_total.item())
if epoch%10 == 9:
epoch_num+=epoch_num
print ((’\r Train Epoch : {}/{} \tLoss : {:.4f}’.format (epoch+1,num_epochs,loss_total)))
print ((’\r Test Epoch : {}/{} \tLoss : {:.4f}’.format (epoch+1,num_epochs,test_loss_total)))
You could try to add another loop after the training epoch to calculate the training loss after the model was trained without further training. This would calculate the current training loss for this epoch without using the running average, which might create a bias.
Let me know, if this reduces the gap or not.
PS: you can post code snippets by wrapping them into three backticks ```
Thank you Patrick.
Just may I ask you to clear this to me more? You say calculating the final error? I mean output of NN? But then how this can explain the problem that I was addresed, smaller error in test then training.
I think the gap might come from the training loss calculation which is using the running average of the batch losses in the current epoch (while the model is being trained), while the validation loss is calculated after the current epoch is finished.
To exclude this possibility, you could also calculate the training loss after the epoch was finished (similar as the validation loss is calculated) and check, if the gap narrows.