How to update parameters module-wise with separate optimizers

I am trying to implement separate updating of parameters of different modules of a model. I have a backbone and a head. I am using two separate optimizers for them, and after calculation of loss, I use optimizer.step() for them separately. I am expecting that when I use the .step() function for the head optimizer, the parameters of backbone will not be updated. Similarly, if I use the 'step() function for the backbone optimizer, then the parameters of the head should not be updated. However, I am getting different results.

This is my code:

class TestModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.backbone = torch.nn.Sequential(
            *[
                torch.nn.Linear(100, 50, bias=False),
                torch.nn.BatchNorm1d(50),
                torch.nn.ReLU(),
                torch.nn.Linear(50, 50, bias=False),
                torch.nn.BatchNorm1d(50),
                torch.nn.ReLU(),
            ]
        )
        self.head = torch.nn.Sequential(
            *[
                torch.nn.Linear(50, 20),
                torch.nn.ReLU(),
                torch.nn.Linear(20, 10),
                torch.nn.ReLU(),
            ]
        )
    
    def forward(self, x):
        return self.head(self.backbone(x))


model = TestModel()
b = model.backbone.state_dict().__str__()
h = model.head.state_dict().__str__()

backbone_optimizer = torch.optim.Adam(model.backbone.parameters())
head_optimizer = torch.optim.Adam(model.head.parameters())

x1 = torch.rand(10, 100)
y1 = model(x1)
loss = torch.nn.MSELoss()
loss_ = loss(y1, torch.ones_like(y1))
loss_.backward()

head_optimizer.step()
print('--------------head backpropagation-------------------')
b1 = model.backbone.state_dict().__str__()
h1 = model.head.state_dict().__str__()
print('backbone parameters changed' if b != b1 else 'backbone parameters not changed')
print('head parameters changed' if h != h1 else 'head parameters not changed')

backbone_optimizer.step()
print('---------------backbone backpropagation-----------------')
b2 = model.backbone.state_dict().__str__()
h2 = model.head.state_dict().__str__()
print('backbone parameters changed' if b1 != b2 else 'backbone parameters not changed')
print('head parameters changed' if h1 != h2 else 'head parameters not changed')

I get the following output:

--------------head backpropagation-------------------
backbone parameters changed
head parameters changed
---------------backbone backpropagation-----------------
backbone parameters changed
head parameters not changed

I am not sure why this is happening, since I explicitly mention the parameters in the different optimizers. How do I get the optimizer to update parameters module-wise? For instance, if I wanted to accumulate gradients for the backbone, but not for the head, then I would need two optimizers that do not update the other parameters. How can this be achieved? Thanks in advance for your help.

I believe this is because simply running forward through the models will update the running stats of the BatchNorm layer, which would interfere with using model.head.state_dict().__str__() to compare the states of the backbone and head. If I remove the BatchNorm layers from your TestModel, I get the following output:

--------------head backpropagation-------------------
backbone parameters not changed
head parameters changed
---------------backbone backpropagation-----------------
backbone parameters changed
head parameters not changed

You can can also manually inspect what is changing by printing out and inspecting the stringified model state dict.