UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()'

tcarvalho · December 17, 2020, 3:46pm

Hey,

I just want to confirm an hypothesis. I had a similar problem when using autocast and gradscaler. However, the problem wasn’t happening when I was in normal precision (FP32). The problem seemed to happen right at the first iteration of training and I was using a pre-trained model.

In order to find the problem I used detect_anomaly and the origin of the problem seemed to happen right at the beginning of the backbrop (I’m using the cross-entropy loss). I also looked at the input of the loss and at the weight of the model and everything seemed fine. However, I notice that in normal precision some of the weight had a really high gradient.

I figured the problem was maybe caused by the gradscaler function that scales some of the gradients to high and they became inf. The problem seemed to be fixed by lowering the init_scale of gradscaler. I’m currently using a value of 2^14 instead of 2^16 and everything seems to work fine now.

Can you confirm that this hypothesis holds its ground and that I’m not missing something?

(I also wanted to put this somewhere since I didn’t found anything on this issue anywhere)

Thanks!