Gradient is None when using torch.sum() or torch.abs()

cijerezg · May 22, 2020, 1:14am

I have the following code:

for count, params in enumerate(agent.actor.parameters()):
     print(params.grad)

where agent is a class that has the neural network: actor. If I run that code, it prints the gradients, which is a tensor of size 4x5. Now, if I do something like this:

for count, params in enumerate(agent.actor.parameters()):
     print(torch.sum(params.grad))

or even this

for count, params in enumerate(agent.actor.parameters()):
     print(torch.abs(params.grad))

Then I get the error: TypeError: sum(): argument 'input' (position 1) must be Tensor, not NoneType

The error is clear, but then why do the gradient become None when I try to do something with them? and how do I correct that? This is a pretty weird error. Moreover, I had done something similar, when the neural net was not part of a class, but declared explicitly, that is:

for count, params in enumerate(actor.parameters()):
     print(torch.abs(params.grad))

where again, actor is a neural net that was explicitly declared (it’s coming from an imported class)

I’d appreciate any help with this! I know this is very weird. I can provide more details about the agent class if necessary, but I don’t think that’s the problem, because it works when I just print the gradients themselves.

ptrblck · May 22, 2020, 9:09am

The operations such as torch.sum and torch.abs shouldn’t change any behavior and I assume you might have used the code snippets in different parts of your original code.

Before the first backward call, all .grad attributes are set to None.
After the gradients were calculated for the very first time or after you’ve called .zero_grad() on the model or optimizer, the .grad attributes will be filled with values or zeros, respectively.

Let me know, if you get stuck and feel free to post a reproducible code snippet.

cijerezg · May 22, 2020, 3:05pm

Thanks! I just realized what the problem was. Your answer was pretty helpful. Basically, during my first iteration, I don’t call .backward(), so the gradients are None, but I call backward() after the second iteration and on, that’s why it prints gradients, but can’t do torch.sum().