I am trying to implement separate updating of parameters of different modules of a model. I have a backbone and a head. I am using two separate optimizers for them, and after calculation of loss, I use optimizer.step()
for them separately. I am expecting that when I use the .step()
function for the head optimizer, the parameters of backbone will not be updated. Similarly, if I use the 'step()
function for the backbone optimizer, then the parameters of the head should not be updated. However, I am getting different results.
This is my code:
class TestModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.backbone = torch.nn.Sequential(
*[
torch.nn.Linear(100, 50, bias=False),
torch.nn.BatchNorm1d(50),
torch.nn.ReLU(),
torch.nn.Linear(50, 50, bias=False),
torch.nn.BatchNorm1d(50),
torch.nn.ReLU(),
]
)
self.head = torch.nn.Sequential(
*[
torch.nn.Linear(50, 20),
torch.nn.ReLU(),
torch.nn.Linear(20, 10),
torch.nn.ReLU(),
]
)
def forward(self, x):
return self.head(self.backbone(x))
model = TestModel()
b = model.backbone.state_dict().__str__()
h = model.head.state_dict().__str__()
backbone_optimizer = torch.optim.Adam(model.backbone.parameters())
head_optimizer = torch.optim.Adam(model.head.parameters())
x1 = torch.rand(10, 100)
y1 = model(x1)
loss = torch.nn.MSELoss()
loss_ = loss(y1, torch.ones_like(y1))
loss_.backward()
head_optimizer.step()
print('--------------head backpropagation-------------------')
b1 = model.backbone.state_dict().__str__()
h1 = model.head.state_dict().__str__()
print('backbone parameters changed' if b != b1 else 'backbone parameters not changed')
print('head parameters changed' if h != h1 else 'head parameters not changed')
backbone_optimizer.step()
print('---------------backbone backpropagation-----------------')
b2 = model.backbone.state_dict().__str__()
h2 = model.head.state_dict().__str__()
print('backbone parameters changed' if b1 != b2 else 'backbone parameters not changed')
print('head parameters changed' if h1 != h2 else 'head parameters not changed')
I get the following output:
--------------head backpropagation-------------------
backbone parameters changed
head parameters changed
---------------backbone backpropagation-----------------
backbone parameters changed
head parameters not changed
I am not sure why this is happening, since I explicitly mention the parameters in the different optimizers. How do I get the optimizer to update parameters module-wise? For instance, if I wanted to accumulate gradients for the backbone, but not for the head, then I would need two optimizers that do not update the other parameters. How can this be achieved? Thanks in advance for your help.