Gradients are None after torch.autograd.grad()

Hi, I have some issues about autograd. I try to implement a network whose nodes in the hidden layer takes diverse noisy inputs. However, it hindered when computing the gradients w.r.t all the parameters.

num_hid = 1
nr_classes = 10
w1 = torch.rand(1024,num_hid,requires_grad = True)-0.5
w1_bias = torch.rand(num_hid,requires_grad = True)-0.5
w2 = torch.rand(num_hid,nr_classes,requires_grad = True)-0.5
w2_bias = torch.rand(nr_classes,requires_grad = True)-0.5

noise_gt_data1 = GenerateNoisyData(n)
linear_output_first_layer = torch.matmul(noise_gt_data1.reshape(-1,1024),w1)+w1_bias
sigmoid = nn.Sigmoid()
activation_output_first_layer = sigmoid(linear_output_first_layer)
out = Variable(torch.matmul(activation_output_first_layer,w2),requires_grad=True)+w2_bias

net = Net(num_hid)
net.body[0].weight.data = w1
net.body[0].bias.data = w1_bias
net.fc[0].weight.data = w2
net.fc[0].bias.data = w2_bias

y = criterion(out, gt_onehot_label)
gradients=torch.autograd.grad(y,net.parameters(),create_graph=True,retain_graph=True,allow_unused=True)
print(gradients)

It shows gradients are all None. I tried both backward() and autograd.grad(). Neither works.

Hi,

Few things: You should not use Variable anymore. All the Tensors can require gradients now.
And if the output of a function does not have requires_grad=True it is because no input were requiring gradients. So you should make sure that the inputs you want the gradients for are set properly.

Also you should not use .data as it has many side effects that you most likely don’t want (and will lead to silently wrong gradients in some cases).
If you want to set the weights of your net you can do:

with torch.no_grad():
  net.body[0].weight.copy_(w1)
  net.body[0].bias.copy_(w1_bias)
  # etc

Really appreciate your help. I have removed Variable().
I kind of initial nodes in the hidden layer accordingly, but when I compute the gradients w.r.t net.parameters(), all the gradients are None. However, if w.r.t [w1,w1_bias,w2,w2_bias], it works properly.
Is there any way to solve it?

As is clear with the torch.no_grad(). setting the value in the net is not a differentiable operation and w1 is not the same as net.body[0].weight. (you previous code was breaking this as well but in a more subtle manner by using .data).

I am afraid you cannot assign these in a differentiable manner. You can set them before and use these to compute the output though if you want the gradient wrt to these paramters to be computed.

solved in the other way, thanks:)