Train mse doesn't converge in 0 when overfitting single batch

milan_kalkenings · June 24, 2023, 8:50am

i debug my model by training many iterations on just one training batch and monitoring the loss achieved on the very batch.
the loss should converge in 0, but clearly doesn’t.
the loss decreases very smooth and just plateaus.
any idea what i am doing wrong?

batch size: 32
task: image super resolution
loss (of interest): mse (mean reduced)
probabilistic modules in network: no (no dropout, noise adding, …)

2, 8, 16, 32, 64, 128, 256, 512_combined_2_1lr_0.0001_losses

srishti-git1110 · June 24, 2023, 9:07am

I don’t see this as totally unexpected or necessarily wrong.
Did you try changing the network architecture? Like adding more layers to it to parameterize it even more – the intuition is maybe the current architecture simply doesn’t have enough parameters to completely fit (all the examples in) this one batch.

Another way to get out of loss stagnation is to fiddle with the learning rate. Maybe try schedulers, lr decay etc.

milan_kalkenings · June 24, 2023, 9:38am

interestingly it plateaus for all learning rates and network widths/depths i tried in the same value (like 2% variance)

lrs i tried : [0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001]

srishti-git1110 · June 24, 2023, 6:27pm

I see.
A constant loss value implies the network has stopped learning (or is moving at a pace that’s virtually constant). One way to get out of it is to increase the learning rate after a constant loss is observed for a certain no. of epochs (lr scheduling).

Anyway, I would still think that it’s not necessarily incorrect to not be able to obtain exact zero loss on a batch.