Frequent discontinuities in loss function during training

I know it’s very difficult to debug without context, but I’m asking just in case this is a known phenomenon. I’m training a U-Net architecture for regression. During training, I frequently see discontinuities in the loss funciton, which look very strange to me. Here’s how the training history looks like:

I’m using Adam optimizer with a learning rate of 1e-3.

Are you using any learning rate schedulers or are resetting any objects (e.g. the optimizer)?

No, I use the default values:

optimizer = torch.optim.Adam(model.parameters(), lr=lr)

My bad, I think the graph shows a different paramter that I wanted to observe during training, not the actual loss function.

@ptrblck Just wanted to post an update in case you were interested. Indeed, I was monitoring a different parameter, but that was not the issue.

The issue was the eps parameter of Adam optimizer, whose default value is 1e-8. I changed it to a much larger value (ex. 1e-4) and I no longer see any fluctuations.

That’s interesting to hear. Were you seeing the same behavior in the loss itself, too?

Yes, the actual loss behaves very similar to the one shown in the screenshot.