I am training a model with an unsupervised loss function:
For validation, I am printing the same loss function for every epoch:
Although the general trend of my validation curve seems to go down, I am wondering why the validation loss is so unsmooth and very different from epoch to epoch. Could this indicate any kind of overfitting?
By the way, my results are looking fine (except for “bad” epochs) but I am looking for ways to improve them even further.
I am already using weight decay for regularization.
It seems that your training loss is even noisier if I understand the plot correctly.
Which batch size are you using? Often larger batches yield a smoother loss curve.
Not necessarily. Although a larger batch size might yield smoother loss curves, the performance might be worse than using noisy smaller batches.
However, you could try to use artificially larger batches by accumulating gradients as described here and compare the results.
Alternatively, you could also use torch.utils.checkpoint to trade compute for memory.