I am getting grad value of None for the following two variables after backward pass. I understand that they may not be leaf variables, so I called retain_grad() on them before calling backward but it did not help.
ranked_tensor[idx] # Variable containing: [torch.cuda.FloatTensor of size 1 (GPU 0)]
ranked_tensor[nonzeroidx.data]. # Variable containing: [torch.cuda.FloatTensor of size 1 (GPU 0)]
for idx in range(nonzeroidx.data-1, -1, -1):
logistic = self.computelogistic(
lambdaij = 1.0/(idx+1) - 1.0/(nonzeroidx.data+1)
logistic *= lambdaij
h = ranked_tensor[nonzeroidx.data].register_hook(lambda grad:
grad * logistic)
ranked_tensor[idx].retain_grad() #ranked_tensor[idx].require_grad is True
h = ranked_tensor[idx].register_hook(lambda grad:
grad * -logistic)
ranked_tensor[idx].backward(retain_graph=True) #ranked_tensor[idx].grad is None!!
ranked_tensor a list or a Tensor? if it’s a tensor, doing
ranked_tensor[idx] creates a new Tensor from it every time you call it. This means that things that change the current Tensor (like retain_grad()) will be applied on this new Tensor. In your case the Tensor on which you call retain_grad and the one where you check the gradients are not the same.
@albanD Hello, and thanks for the comment. calling ranked_tensor[idx] at multiple places was indeed the problem. I would not have guessed that indexing a variable without assignment creates a new variable.
I changed my code by defining two variables at the top of the for loop so that register_hook(), retain_grad() and backward() . will all be called on the same variable:
ranked_tensori = ranked_tensor[nonzeroidx.data]
ranked_tensorj = ranked_tensor[idx]
This code was working until I refactored some unrelated code, now I am getting the following error:
raise RuntimeError(“cannot register a hook on a volatile variable”)
If you cannot call
retain_grad() on a variable if you run with
I guess your evaluation code has been changed?
Note that you can use
torch.enable_grad() to enforce grad computation locally.
You are correct. My validation code needed tweaking after refactoring the training loop.