Optimizer.step not working though grads exist after requires_grad_ switch

I have a network that trains for a bit. Then I call requires_grad_(False) to some layers then later call requires_grad_() to those layers and requires_grad_(False) to the other layers. At this point training completely stops. Looking at Gradients exist but weights not updating I have the code:

if(epoch == test):
    before = list(cnn.parameters())
    for i in range(len(before)):
        before[i] = before[i].clone()
loss.backward()
optimizer.step()
if(epoch == test):
    after = list(cnn.parameters())
    for i in range(len(before)):
        print(torch.equal(before[i].data, after[i].data))
        if(type(after[i].grad) != type(None)):
            print(after[i].grad.data[:].max())

On epoch 0 before the requires grad it works.

False
tensor(7.1678e-08, device=‘cuda:0’, dtype=torch.float64)
False
tensor(3.6259e-06, device=‘cuda:0’, dtype=torch.float64)
True
True
False
tensor(1.1820e-05, device=‘cuda:0’, dtype=torch.float64)
False
tensor(2.0888e-05, device=‘cuda:0’, dtype=torch.float64)
True
True
False
tensor(8.6558e-05, device=‘cuda:0’, dtype=torch.float64)
False
tensor(0.0002, device=‘cuda:0’, dtype=torch.float64)
True
True
False
tensor(0.0020, device=‘cuda:0’, dtype=torch.float64)
False
tensor(0.0041, device=‘cuda:0’, dtype=torch.float64)
True
True

But then on epoch 2 after the requires grad happens the grad is still calculated but the weights dont get updated

True
tensor(-6.2089e-07, device=‘cuda:0’, dtype=torch.float64)
True
tensor(1.1862e-06, device=‘cuda:0’, dtype=torch.float64)
True
tensor(-1.1782e-06, device=‘cuda:0’, dtype=torch.float64)
True
True
True
tensor(2.5829e-06, device=‘cuda:0’, dtype=torch.float64)
True
tensor(4.0316e-06, device=‘cuda:0’, dtype=torch.float64)
True
tensor(5.7105e-06, device=‘cuda:0’, dtype=torch.float64)
True
True
True
tensor(3.4637e-05, device=‘cuda:0’, dtype=torch.float64)
True
tensor(5.0992e-05, device=‘cuda:0’, dtype=torch.float64)
True
tensor(8.0335e-05, device=‘cuda:0’, dtype=torch.float64)
True
True
True
tensor(0.0016, device=‘cuda:0’, dtype=torch.float64)
True
tensor(0.0009, device=‘cuda:0’, dtype=torch.float64)
True
tensor(0.0019, device=‘cuda:0’, dtype=torch.float64)
True
True

Are you changing the learning rate as well when you change these properties?
Or anything else with the optimizers?

Nope, everything else is the same

Could you give a small code sample that we can run that reproduces this please?

Looks like the problem actually was the optimizers. One other thing I was doing is reloading the model. I think with an old verion of pytorch the optimizers would automatically point to the reloaded layers, but now it appears to have to be reinitialized. Looks like that fixed it!

1 Like

Hi,

Do you mind sharing your solution?
I run into a very similar problem as yours.
Thank you!

Sorry if that explanation wasn’t clear. I’m not home but this is the psudo code of what was happening.

model = initializeFunction()
optimizer = optimizer(model.params)

do stuff for some epochs

model = loadBestModel()
doMore stuff.

After loadBestModel() I needed to reinitialize the optimizer. Basically make sure that your optimizer is actually pointing to the parameters that have gradients.

Thanks for your response.
What do you mean by reinitialize the optimizer?
Is it just to run the line below again?
optimizer = optimizer(model.params)

In my case, I am trying to train one model with two different dataset.
But the model failed to recognize the test data from the second dataset at all.
My flow is as following:

model = initializeFunction()

optimizer = optimizer(model.params)
train for some epochs using the dataset1 with optimizer
test with dataset1

optimizer_1 = optimizer(model.params)
train for some epochs using the dataset1 with optimizer_1
test with dataset2

Yes, I do mean to just rerun optimizer=optimizer(model.params). Make sure that if you have otimizer, and optimizer_1 its correct. and if any of these are being called in separate functions make sure that you are returning the new optmizer/model that you want to use from those functions. I’m guessing the core of your problem is the same as mine, that when you call optimizer.step that optimizer is not actually pointing to your tensors that have the gradients. but could be a number of ways that has happened.

Thanks for your help!
Following your suggestion, I found the problem.
During the second training, the model weights are not changing at all.
I need to explicitly add another model.train() step to fix this issue.

1 Like