Optimizer changes value for a parameter that is showing no gradient

pytorcher · April 17, 2020, 8:29pm

I’m trying to inspect the gradients of parameters in my model. For one of them it is showing it does not have a gradient, but it changes with optimizer.step(). Taken from pdb:

-> atoptimizer.step()
(Pdb) variableName
tensor(0., device='cuda:0', grad_fn=<SelectBackward>)
(Pdb) variableName.grad
(Pdb) next
> file.py(476)train()
-> nextLine
(Pdb) variableName
tensor(3.7592e-08, device='cuda:0', grad_fn=<SelectBackward>)

How is the optimizer changing the value when it is showing nothing for the .grad field?

ptrblck · April 18, 2020, 4:29am

Depending on the used optimizer, parameters might be changes even with a zero gradient, if running estimates are used, which would be the case for e.g. Adam.
Which optimizer are you using and are you zeroing out the gradients manually after the backward pass?

pytorcher · April 19, 2020, 9:12pm

I’m just using SGD

optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0, weight_decay=0)

I make a call to optimizer.zero_grad() at the start of the loop where the batch is loaded

ptrblck · April 20, 2020, 12:31am

Thanks for the update. Could you post a code snippet to reproduce this issue, please?

pytorcher · April 20, 2020, 9:51pm

My code is super long and has a bunch of weird stuff going on. Last time I tried to make a minimal example for something I was not able to. If it helps this is where that particular parameter is created, which is probably be being done in a weird way.

if(numEpochs > 0):
    values = torch.cat((self.theParameter[numEpochs- 
   1],nn.Parameter(torch.Tensor(1,self.planes).zero_().cuda() + multiplier)),0)
    self.theParameter[numEpochs] = nn.Parameter(values.data.clone().cuda(), requires_grad=True)
self.register_parameter('newestParameter', self.theParameter[numEpochs])
else:
    self.theParameter[0] = nn.Parameter(torch.Tensor(1,self.planes).zero_().cuda().clone(), requires_grad=True)
    self.register_parameter('newestParameter', self.theParameter[numEpochs])

and it is having this problem at the first epoch

ptrblck · April 20, 2020, 10:32pm

I’m not sure, how you are using newestParameter, but note that overriding it might create some unexpected results.
Maybe this code snippet helps:

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.param = nn.Parameter(torch.randn(1))
        self.register_parameter('param1', self.param)

    def forward(self, x):
        x = self.param1 * x
        return x
    

model = MyModel()
print(dict(model.named_parameters()))

out = model(torch.randn(1))
out.backward()

print(model.param.grad)
print(model.param1.grad)


# next iteration
model.zero_grad()
model.register_parameter('param1', nn.Parameter(torch.randn(1)))
print(dict(model.named_parameters()))

out = model(torch.randn(1))
out.backward()

print(model.param.grad)
print(model.param1.grad)

pytorcher · May 6, 2020, 1:11am

Figured it out my mistake. I was trying to print the grad not of the entire tensor. variableName in the original post was actually variableName[index1][index2][index3] where variableName is a dict, variableName[index1] is a tensor, and index2 and index3 are going to specific rows and columns. Looks like grad only gets printed for the main tensor. Other people looking at this might be able to confirm they made the same mistake if they see grad_fn = SelectBackward, which doesnt show up when printing the full tensor.