Hello,
I am making a simple test on MLP to replace the normal weight with a customized residual version weight, I used torch.nn.Parameter to achieve that:
for i, lin in enumerate(self.lins[:-1]):
if self.residual_weight and lin.weight.size(-1) == args.hidden:
print("before", lin.weight.grad)
lin.weight = torch.nn.Parameter(args.phi*torch.eye(args.hidden).to(device) + (1-
args.phi)*lin.weight.clone(), requires_grad=True)
print("after", lin.weight.grad)
x = lin(x)
x = self.bns[i](x)
x = F.relu(x)
x = F.dropout(x, p=args.dropout, training=self.training)
x = self.lins[-1](x)
The first print (“before”) statement will output the gradient normally, but the second one (“after”) will always output None, even I have set requires_grad=True, I am wondering what I should do to solve this problem to let the backward prop perform well. Thanks.