Can loss vary when overfitting a single batch?

spadel · October 10, 2020, 6:41pm

I am trying to overfit a single batch in order to test, whether my network is working as intended. I would have expected, that the loss should keep decrease as long as the learning rate isn’t too high. What I observe, however, is that the loss in fact decreases over time, but it fluctuates strongly. Is that a sign, that I have a flaw in my architecture?

Kushaj · October 10, 2020, 7:27pm

Use small batch size (like 2). Also, this test only tells if the model has enough capacity to learn the data, so if you are able to reach a loss of 0, then it means that you passed the test.

spadel · October 10, 2020, 7:53pm

I am trying that. But I can’t reach zero. My question is exactly the following: Should the loss strictly decrease provided a sufficiently small learning rate while overfitting or can it vary?

Kushaj · October 10, 2020, 9:04pm

It should gradually decrease. (some fluctuation is ok, but not strong fluctuation as in your case). If the model is correct the fluctuation might be due to bad hyperparameters (probably lr or momentum). Also, do not use any data augmentation, weight decay, dropout or any other fancy regularization trick.

spadel · October 11, 2020, 10:56am

I can’t believe I oversaw that - I had a random augmentation activated within the Dataloder! Thanks for pointing that out!