Gradients are None after torch.autograd.grad()

jia · September 17, 2020, 8:45am

Hi, I have some issues about autograd. I try to implement a network whose nodes in the hidden layer takes diverse noisy inputs. However, it hindered when computing the gradients w.r.t all the parameters.

num_hid = 1
nr_classes = 10
w1 = torch.rand(1024,num_hid,requires_grad = True)-0.5
w1_bias = torch.rand(num_hid,requires_grad = True)-0.5
w2 = torch.rand(num_hid,nr_classes,requires_grad = True)-0.5
w2_bias = torch.rand(nr_classes,requires_grad = True)-0.5

noise_gt_data1 = GenerateNoisyData(n)
linear_output_first_layer = torch.matmul(noise_gt_data1.reshape(-1,1024),w1)+w1_bias
sigmoid = nn.Sigmoid()
activation_output_first_layer = sigmoid(linear_output_first_layer)
out = Variable(torch.matmul(activation_output_first_layer,w2),requires_grad=True)+w2_bias

net = Net(num_hid)
net.body[0].weight.data = w1
net.body[0].bias.data = w1_bias
net.fc[0].weight.data = w2
net.fc[0].bias.data = w2_bias

y = criterion(out, gt_onehot_label)
gradients=torch.autograd.grad(y,net.parameters(),create_graph=True,retain_graph=True,allow_unused=True)
print(gradients)

It shows gradients are all None. I tried both backward() and autograd.grad(). Neither works.

albanD · September 17, 2020, 7:21pm

Hi,

Few things: You should not use Variable anymore. All the Tensors can require gradients now.
And if the output of a function does not have requires_grad=True it is because no input were requiring gradients. So you should make sure that the inputs you want the gradients for are set properly.

Also you should not use .data as it has many side effects that you most likely don’t want (and will lead to silently wrong gradients in some cases).
If you want to set the weights of your net you can do:

with torch.no_grad():
  net.body[0].weight.copy_(w1)
  net.body[0].bias.copy_(w1_bias)
  # etc

jia · September 18, 2020, 8:28am

Really appreciate your help. I have removed Variable().
I kind of initial nodes in the hidden layer accordingly, but when I compute the gradients w.r.t net.parameters(), all the gradients are None. However, if w.r.t [w1,w1_bias,w2,w2_bias], it works properly.
Is there any way to solve it?

albanD · September 18, 2020, 2:14pm

As is clear with the torch.no_grad(). setting the value in the net is not a differentiable operation and w1 is not the same as net.body[0].weight. (you previous code was breaking this as well but in a more subtle manner by using .data).

I am afraid you cannot assign these in a differentiable manner. You can set them before and use these to compute the output though if you want the gradient wrt to these paramters to be computed.

jia · September 22, 2020, 12:30pm

solved in the other way, thanks:)