I am trying to overfit a single batch in order to test, whether my network is working as intended. I would have expected, that the loss should keep decrease as long as the learning rate isn’t too high. What I observe, however, is that the loss in fact decreases over time, but it fluctuates strongly. Is that a sign, that I have a flaw in my architecture?
Use small batch size (like 2). Also, this test only tells if the model has enough capacity to learn the data, so if you are able to reach a loss of 0, then it means that you passed the test.
I am trying that. But I can’t reach zero. My question is exactly the following: Should the loss strictly decrease provided a sufficiently small learning rate while overfitting or can it vary?
It should gradually decrease. (some fluctuation is ok, but not strong fluctuation as in your case). If the model is correct the fluctuation might be due to bad hyperparameters (probably lr or momentum). Also, do not use any data augmentation, weight decay, dropout or any other fancy regularization trick.
I can’t believe I oversaw that - I had a random augmentation activated within the Dataloder! Thanks for pointing that out!