Frequent discontinuities in loss function during training

ma-sadeghi · November 24, 2022, 7:24pm

I know it’s very difficult to debug without context, but I’m asking just in case this is a known phenomenon. I’m training a U-Net architecture for regression. During training, I frequently see discontinuities in the loss funciton, which look very strange to me. Here’s how the training history looks like:

I’m using Adam optimizer with a learning rate of 1e-3.

ptrblck · November 24, 2022, 7:54pm

Are you using any learning rate schedulers or are resetting any objects (e.g. the optimizer)?

ma-sadeghi · November 24, 2022, 8:20pm

No, I use the default values:

optimizer = torch.optim.Adam(model.parameters(), lr=lr)

ma-sadeghi · November 28, 2022, 6:34pm

My bad, I think the graph shows a different paramter that I wanted to observe during training, not the actual loss function.

ma-sadeghi · December 8, 2022, 2:14am

@ptrblck Just wanted to post an update in case you were interested. Indeed, I was monitoring a different parameter, but that was not the issue.

The issue was the eps parameter of Adam optimizer, whose default value is 1e-8. I changed it to a much larger value (ex. 1e-4) and I no longer see any fluctuations.

ptrblck · December 8, 2022, 6:21am

That’s interesting to hear. Were you seeing the same behavior in the loss itself, too?

ma-sadeghi · December 8, 2022, 3:44pm

Yes, the actual loss behaves very similar to the one shown in the screenshot.