torch.nn.Parameter has None gradient

J_B_28 · January 1, 2023, 12:03am

Hello,

I am making a simple test on MLP to replace the normal weight with a customized residual version weight, I used torch.nn.Parameter to achieve that:

for i, lin in enumerate(self.lins[:-1]):
      if self.residual_weight and lin.weight.size(-1) == args.hidden:
         print("before", lin.weight.grad)
         lin.weight = torch.nn.Parameter(args.phi*torch.eye(args.hidden).to(device) + (1- 
                            args.phi)*lin.weight.clone(), requires_grad=True)
         print("after", lin.weight.grad)
      x = lin(x)
      x = self.bns[i](x)
      x = F.relu(x)
      x = F.dropout(x, p=args.dropout, training=self.training)
x = self.lins[-1](x)

The first print (“before”) statement will output the gradient normally, but the second one (“after”) will always output None, even I have set requires_grad=True, I am wondering what I should do to solve this problem to let the backward prop perform well. Thanks.

ptrblck · January 1, 2023, 12:13am

It seems you are recreating the parameter in the forward method without any gradient history so the None .grad attribute is expected in the "after" line of code.
The .grad attribute will be populated in the .backward call of the loss.
Note however, that your optimizer might be update this parameter unless you pass it as a new param group to it.