I am still pretty new to neural networks and pytorch. Now that my Neural Net seems to be computing pretty great results, I am having a problem with the interpretation of the losses (training loss seems to be always higher than validation loss).
But first, here is the information about the data I have been working with:
3d CT images (so only one channel)
always same sizes for height (512) and width (512)
varying size for depth
Output should be denoised images.
Since the images of course are too big to train with the whole image, I had to work with patches and took the same number of random sample patches from each image.
For training I am using the sample patches of 47 images and for validation of 12 images.
In order to be able to compare the losses obtained during training and validation to the testing loss of the complete images, I would like to have loss plots that show the average loss per voxel (this also makes sense to me since the images are not the same sizes in depth)
Now I already tried setting reduction=‘sum’ and adding all mini-batch losses which in the end of each epoch are devided by the whole number of training-patches or validation-patches and then devided by the number of voxels per patch.
But somehow this always gives me a higher loss for training than for validation. Especially for the first Epoch, the loss is easily 10 times higher for training.
I already tried a lot of loss calculations, but nothing seems right. Could someone please try to help me out?
Could you post the loss calculation and in particular how the average loss per voxel is calculated?
You can sometimes observe a higher training loss compared to the validation loss, if e.g. dropout is used and the training model has thus less capacity.
However, a 10x increase seems too large.
Since I didn’t use any dropout, I guess something about my calculation is very off.
In the end I thought calculating how far off my resulting images are from the groundtruth by using L1 and L2 distance per Voxel as such example (in order to be able to compare statistics of different images, with different sizes, to each other):
Image groundtruth g
Image output o
–> images as numpy arrays:
abs_difference = np.absolute(g-o)
l1 = np.mean(abs_difference)
l2 = np.sqrt(np.mean(abs_difference*abs_difference))
If my idea is completely off, please tell me.
So now I switched to L1Loss, because I realized my mistake and MSELoss will not work per voxel if I don’t want to implement my own loss function (should have realized that earlier).
Still with L1Loss and reduction=‘sum’, and summing all losses of one epoch, then dividing them by (‘number of training or validation patches’ times ‘number of voxels per patch’) gives me a loss twice as high for training than validation in the beginning. Is this problematic? It is becoming better fast though (similar values for training and validation).
If you are accumulating the training loss for the complete epoch and calculate the average at the end of each epoch, the training loss might be higher, since your model was trained during the epoch.
E.g. the first batch might yield a training loss of 100, while the last batch yields 10. While the validation loss would be calculated using the “trained” model, the training loss would be the average of the complete epoch.
To verify it, you could recalculate the training loss after each epoch and compare it to the validation loss.