'requires_grad = False' VS 'lr = 0'

Liang · December 24, 2017, 2:39pm

I tried to fine-tune VGG16 by only training the last FC layer. When i use requires _grad = False for previous layers, it converges and output reasonable results. But when i set learning rates of previous layers to 0 and keep the same learning rate for the last layer, it fails to converge.
Any difference between these two ways?

SimonW · December 24, 2017, 6:22pm

What’s your optimizer and how did you set the lrs?

Liang · December 24, 2017, 10:54pm

I use the SGD optimiazer. For leraning rate, when using ‘requires_grad=False’, i ignore previous layers and set lr of last layer to be 0.01. When using ‘lr=0’, i set lrs of previous layers to be 0 and last layer to be 0.01.
According to my knowledge, ‘lr=0’ should not update parameters and thus equals to 'requires_grad=False '.(Though it maight lead to more computation.)

SimonW · December 25, 2017, 5:16pm

Sorry for not being clear. What methods did you use to set the lrs?

Liang · December 26, 2017, 3:06am

Sorry, my fault. This problem is not related to Pytorch itself. I came across numerical instability when I used ‘lr=0’. 'requires_grad = False’ and ‘lr = 0’ do output same results. BTY, ‘lr=0’ requires more computation.

Thank you.