I am facing a weird issue when trying to optimize parameters in a custom module. In particular, the weights of certain parameters are not updated, even if their gradients are computed.
CustomModule is defined as:
A list of Batch Normalization (BN) layers, in which, only one is chosen for the forward/backward pass. That is, the model will learn different gammas/betas (based on the selected BN layer).
A fully-connected layer.
Of course, this structure, defined as such, does not make sense. I just wanted to focus on the occurred issue. Below, the structure of the
class CustomModule(nn.Module): def __init__(self, num_batch_norm): super(CustomModule, self).__init__() self.batch_norm_list = [nn.BatchNorm1d(10) for _ in range(num_batch_norm)] self.fc = nn.Linear(10, 2) def forward(self, x, num_bn): x = self.batch_norm_list[num_bn](x) x = self.fc(x) return x
I create a
model (instance of
three BN layers and
SGD optimizer. After, I define test variables
y to perform a single pass through the
model (including the backward pass).
torch.manual_seed(0) # Create a 'CustomModule' with three BN layers. model = CustomModule(3) optimizer = torch.optim.SGD(model.parameters(), lr=1e-2) # Define test variables: the input (x) and the target (y). x = torch.rand(5, 10) y = torch.rand(5, 2) # Pass 'x' through the 2nd (index=1) BN layer. out = model(x, 1) loss = torch.linalg.norm(out - y) model.zero_grad() loss.backward() optimizer.step()
I check that the gradients for the 2nd BN layer (chosen in the forward pass) are computed.
# Check the weights' gradients in the 2nd BN layer. model.batch_norm_list.weight.grad # tensor([ 0.0810, 0.0052, 0.2871, 0.3091, 0.2437, 0.0916, 0.1410, 0.4406, # -0.0288, 0.0740])
However, when I check the weights (gammas) after optimization, I noticed that they are not updated (in batch normalization, gammas (weights) are initialized by
# Check the weights in the 2nd BN layer (after optimization) model.batch_norm_list.weight # Parameter containing: # tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True)
Having the initial weights (a tensor of ones) and gradients, I expected the updated weights to be like below (using the
# Expected values weights = model.batch_norm_list.weight gradients = model.batch_norm_list.weight.grad lr = 1e-2 expected_weights = weights - lr * gradients # tensor([0.9992, 0.9999, 0.9971, 0.9969, 0.9976, 0.9991, 0.9986, 0.9956, 1.0003, # 0.9993], grad_fn=<SubBackward0>)
So, I want to know why this issue encounters and how to solve it.