Grad is none when the grad_requires=True after adding the retain_grad()

Maxxingxing · April 28, 2020, 3:33am

Hello,guys~~I can not get the grad of the input of my net. I have tried a lot of solutions provided in the forum, but it can not work. Following is part of my code, and anybody can help? Thanks a lot!

X_emb = Variable(X_emb.data,requires_grad=True)
X_emb = X_emb.to(device)
print(X_emb.dtype)
#X_emb.requires_grad_()
print(X_emb.requires_grad)
X_emb.retain_grad()
T = opt.T
epsilon = opt.epsilon

net.zero_grad()
#y_hat = net(X,seq_lengths)
y_hat = net_rest1(net,X_emb,seq_lengths)
y_hat.requires_grad_()
print(y_hat.requires_grad)
, pred_idx = torch.max(y_hat.data,1)
labels = Variable(pred_idx)
y_hat = y_hat / T
y_hat.requires_grad()
print(y_hat.requires_grad)
loss = xent(y_hat,labels)
loss.requires_grad_()
X_emb.retain_grad()
loss.backward()

print(type(X_emb))
print(X_emb.grad)
X_emb = X_emb - epsilon * torch.sign(X_emb.grad) # here occures the error because X_emb.grad is None

loss.backwarc() can pass but I can’t get the grad. I am very confused with the result…

chetan06 · April 28, 2020, 3:46am

Your Variable X_emb has no value so how its grad can be calculated.

Maxxingxing · April 28, 2020, 5:29am

this is just a part of my code, it must have value…

albanD · April 28, 2020, 9:54pm

Hi,

The issue is that you have some non-differentiable operations.
More generally. You should never need to call .requires_grad_() unless for the Tensor for which you want .grad to be populated.

Here is updated code with comments

# If you want to break the graph, use `.detach()`
X_emb = X_emb.detach()
# Move to the right device
X_emb = X_emb.to(device)
# Tensor still a leaf as it does not require gradients. Make it now
X_emb.requires_grad_()
# No need for retain_grad() as it is already a leaf

T = opt.T
epsilon = opt.epsilon

net.zero_grad()
#y_hat = net(X,seq_lengths)
y_hat = net_rest1(net,X_emb,seq_lengths)
# y_hat.requires_grad_() don't add extra requires_grad they are never needed
, pred_idx = torch.max(y_hat.data,1)
labels = pred_idx
y_hat = y_hat / T
print(y_hat.requires_grad)
loss = xent(y_hat,labels)
# loss.requires_grad_()
loss.backward()

print(type(X_emb))
print(X_emb.grad)

If a Tensor in the middle does not require gradients even though the input to the op does, that means the op is not differentiable and so no gradient can flow back.