Grad is None after backward pass

Anurag · May 5, 2018, 8:59am

I am getting grad value of None for the following two variables after backward pass. I understand that they may not be leaf variables, so I called retain_grad() on them before calling backward but it did not help.

ranked_tensor[idx] # Variable containing: [torch.cuda.FloatTensor of size 1 (GPU 0)]
ranked_tensor[nonzeroidx.data[0][0]]. # Variable containing: [torch.cuda.FloatTensor of size 1 (GPU 0)]

            for idx in range(nonzeroidx.data[0][0]-1, -1, -1):
                    logistic = self.computelogistic(
                       ranked_tensor[nonzeroidx.data[0][0]],
                       ranked_tensor[idx])
                    lambdaij = 1.0/(idx+1) - 1.0/(nonzeroidx.data[0][0]+1)
                    logistic *= lambdaij
                    ranked_tensor[nonzeroidx.data[0][0]].retain_grad()
                    h = ranked_tensor[nonzeroidx.data[0][0]].register_hook(lambda grad:
                                                                    grad * logistic)
                    ranked_tensor[nonzeroidx.data[0][0]].backward(
                                                          retain_graph=True)
                    h.remove()
                    ranked_tensor[idx].retain_grad() #ranked_tensor[idx].require_grad is True
                    h = ranked_tensor[idx].register_hook(lambda grad:
                                                            grad * -logistic)
                    ranked_tensor[idx].backward(retain_graph=True) #ranked_tensor[idx].grad is None!!
                    h.remove()

albanD · May 5, 2018, 12:07pm

Hi, is ranked_tensor a list or a Tensor? if it’s a tensor, doing ranked_tensor[idx] creates a new Tensor from it every time you call it. This means that things that change the current Tensor (like retain_grad()) will be applied on this new Tensor. In your case the Tensor on which you call retain_grad and the one where you check the gradients are not the same.

Anurag · May 6, 2018, 2:30am

@albanD Hello, and thanks for the comment. calling ranked_tensor[idx] at multiple places was indeed the problem. I would not have guessed that indexing a variable without assignment creates a new variable.

Anurag · May 7, 2018, 9:05am

I changed my code by defining two variables at the top of the for loop so that register_hook(), retain_grad() and backward() . will all be called on the same variable:

                    ranked_tensori = ranked_tensor[nonzeroidx.data[0][0]]
                    ranked_tensorj = ranked_tensor[idx]

This code was working until I refactored some unrelated code, now I am getting the following error:
raise RuntimeError(“cannot register a hook on a volatile variable”)

albanD · May 7, 2018, 9:11am

If you cannot call retain_grad() on a variable if you run with torch.no_grad().
I guess your evaluation code has been changed?
Note that you can use torch.enable_grad() to enforce grad computation locally.

Anurag · May 7, 2018, 9:47am

You are correct. My validation code needed tweaking after refactoring the training loop.