'requires_grad = False' VS 'lr = 0'

I tried to fine-tune VGG16 by only training the last FC layer. When i use requires _grad = False for previous layers, it converges and output reasonable results. But when i set learning rates of previous layers to 0 and keep the same learning rate for the last layer, it fails to converge.
Any difference between these two ways?

1 Like

What’s your optimizer and how did you set the lrs?

I use the SGD optimiazer. For leraning rate, when using ‘requires_grad=False’, i ignore previous layers and set lr of last layer to be 0.01. When using ‘lr=0’, i set lrs of previous layers to be 0 and last layer to be 0.01.
According to my knowledge, ‘lr=0’ should not update parameters and thus equals to 'requires_grad=False '.(Though it maight lead to more computation.)

Sorry for not being clear. What methods did you use to set the lrs?

Sorry, my fault. This problem is not related to Pytorch itself. I came across numerical instability when I used ‘lr=0’. 'requires_grad = False’ and ‘lr = 0’ do output same results. BTY, ‘lr=0’ requires more computation.

Thank you.

4 Likes