Optimizer.step not working though grads exist after requires_grad_ switch

pytorcher · December 16, 2019, 2:58pm

I have a network that trains for a bit. Then I call requires_grad_(False) to some layers then later call requires_grad_() to those layers and requires_grad_(False) to the other layers. At this point training completely stops. Looking at Gradients exist but weights not updating I have the code:

if(epoch == test):
    before = list(cnn.parameters())
    for i in range(len(before)):
        before[i] = before[i].clone()
loss.backward()
optimizer.step()
if(epoch == test):
    after = list(cnn.parameters())
    for i in range(len(before)):
        print(torch.equal(before[i].data, after[i].data))
        if(type(after[i].grad) != type(None)):
            print(after[i].grad.data[:].max())

On epoch 0 before the requires grad it works.

False
tensor(7.1678e-08, device=‘cuda:0’, dtype=torch.float64)
False
tensor(3.6259e-06, device=‘cuda:0’, dtype=torch.float64)
True
True
False
tensor(1.1820e-05, device=‘cuda:0’, dtype=torch.float64)
False
tensor(2.0888e-05, device=‘cuda:0’, dtype=torch.float64)
True
True
False
tensor(8.6558e-05, device=‘cuda:0’, dtype=torch.float64)
False
tensor(0.0002, device=‘cuda:0’, dtype=torch.float64)
True
True
False
tensor(0.0020, device=‘cuda:0’, dtype=torch.float64)
False
tensor(0.0041, device=‘cuda:0’, dtype=torch.float64)
True
True

But then on epoch 2 after the requires grad happens the grad is still calculated but the weights dont get updated

True
tensor(-6.2089e-07, device=‘cuda:0’, dtype=torch.float64)
True
tensor(1.1862e-06, device=‘cuda:0’, dtype=torch.float64)
True
tensor(-1.1782e-06, device=‘cuda:0’, dtype=torch.float64)
True
True
True
tensor(2.5829e-06, device=‘cuda:0’, dtype=torch.float64)
True
tensor(4.0316e-06, device=‘cuda:0’, dtype=torch.float64)
True
tensor(5.7105e-06, device=‘cuda:0’, dtype=torch.float64)
True
True
True
tensor(3.4637e-05, device=‘cuda:0’, dtype=torch.float64)
True
tensor(5.0992e-05, device=‘cuda:0’, dtype=torch.float64)
True
tensor(8.0335e-05, device=‘cuda:0’, dtype=torch.float64)
True
True
True
tensor(0.0016, device=‘cuda:0’, dtype=torch.float64)
True
tensor(0.0009, device=‘cuda:0’, dtype=torch.float64)
True
tensor(0.0019, device=‘cuda:0’, dtype=torch.float64)
True
True

albanD · December 16, 2019, 3:53pm

Are you changing the learning rate as well when you change these properties?
Or anything else with the optimizers?

pytorcher · December 16, 2019, 11:41pm

Nope, everything else is the same

albanD · December 17, 2019, 12:39pm

Could you give a small code sample that we can run that reproduces this please?

pytorcher · December 23, 2019, 3:13am

Looks like the problem actually was the optimizers. One other thing I was doing is reloading the model. I think with an old verion of pytorch the optimizers would automatically point to the reloaded layers, but now it appears to have to be reinitialized. Looks like that fixed it!

Yitao_Chen · August 7, 2021, 5:53am

Hi,

Do you mind sharing your solution?
I run into a very similar problem as yours.
Thank you!

pytorcher · August 7, 2021, 2:13pm

Sorry if that explanation wasn’t clear. I’m not home but this is the psudo code of what was happening.

model = initializeFunction()
optimizer = optimizer(model.params)

do stuff for some epochs

model = loadBestModel()
doMore stuff.

After loadBestModel() I needed to reinitialize the optimizer. Basically make sure that your optimizer is actually pointing to the parameters that have gradients.

Yitao_Chen · August 7, 2021, 6:04pm

Thanks for your response.
What do you mean by reinitialize the optimizer?
Is it just to run the line below again?
optimizer = optimizer(model.params)

Yitao_Chen · August 7, 2021, 6:11pm

In my case, I am trying to train one model with two different dataset.
But the model failed to recognize the test data from the second dataset at all.
My flow is as following:

model = initializeFunction()

optimizer = optimizer(model.params)
train for some epochs using the dataset1 with optimizer
test with dataset1

optimizer_1 = optimizer(model.params)
train for some epochs using the dataset1 with optimizer_1
test with dataset2

pytorcher · August 7, 2021, 7:31pm

Yes, I do mean to just rerun optimizer=optimizer(model.params). Make sure that if you have otimizer, and optimizer_1 its correct. and if any of these are being called in separate functions make sure that you are returning the new optmizer/model that you want to use from those functions. I’m guessing the core of your problem is the same as mine, that when you call optimizer.step that optimizer is not actually pointing to your tensors that have the gradients. but could be a number of ways that has happened.

Yitao_Chen · August 8, 2021, 12:59am

Thanks for your help!
Following your suggestion, I found the problem.
During the second training, the model weights are not changing at all.
I need to explicitly add another model.train() step to fix this issue.