Grad None when using custom loss function for a2c implementation


I am implementing an A2C algorithm, the loss function for the critic is simply the advantage function and that works.
For the actor I use this:

loss = -distribution.log_prob(sample)*advantage.detach()
loss.requires_grad = True

where advantage is calculated using the critic and sample is from sampling from the distribution I am trying to learn.

variance_energy = 0.01
variance_decision = 0.01
mean_matrix = torch.FloatTensor([mean_energy, mean_decision])
covariance_matrix = torch.FloatTensor([[variance_energy, 0],[0, variance_decision]])
distribution = MultivariateNormal(mean_matrix, covariance_matrix)
sample = distribution.sample()

Here, mean_energy and mean_decision are the output of my NN and are restricted to positive values (so I don’t have problems with the log)
The gradient is always None.
I have already checked that required_grad = True and is_leaf = True for the loss function also.

I calculate the gradients of the weights manually (so I can update them that way) after calling loss.backward() with:

 for p in self.model.parameters():

Does anybody have an idea how to troubleshoot this? Thanks in advance!

Hi Codeflux!

advantage.detach() “breaks the computation graph,” and
loss.requires_grad = True doesn’t repair the damage.

Regardless of whether your code can be tweaked to (appear to)
work, loss must be usefully differentiable with respect to the
parameters you are trying to train in order for backpropagation
and training to work.


K. Frank

Thanks for the reply! Turns out I don’t need loss.required_grad = True and the .detach() call on the advantage function for the loss function to have grad enabled.
But now I receive the error message:

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Seems like backpropagation goes through my network two times?

I’ve also tried something else, removing the advantage function (which is ofc not correct for actor-critic). Then for some reason the loss function does not have grad_enabled, and using grad_enabled = True results in the gradient of my weights being zero again.
Is_Leaf is true again though.

I think here is already something wrong.

It appears that .backward does not seem to work with the MultivariateNormal function… when I use Normal instead for just one parameter, I can calculate the gradient.

Can anyone confirm if this is true?

Hi Codeflux!

I believe that MultivariateNormal and Normal have the same
behavior in this regard.

In general, you can’t differentiate or back propagate through calling
.sample() on a Distribution. This is true for Normal, as well as
MultivariateNormal. In contrast, you typically can backpropagate
through non-sampling methods such as .log_prob().

Here are some illustrative results:

>>> import torch
>>> torch.__version__
>>> meanN = torch.zeros (1, requires_grad = True)
>>> stdN = torch.ones (1, requires_grad = True)
>>> distN = torch.distributions.Normal (meanN, stdN)
>>> xN = distN.sample()
>>> xN.grad_fn
>>> yN = distN.log_prob (torch.zeros (1))
>>> yN.grad_fn
<SubBackward0 object at 0x00000188F9459CF8>
>>> yN.backward()
>>> meanN.grad
>>> stdN.grad
>>> meanMV = torch.zeros (1, requires_grad = True)
>>> stdMV = torch.eye (1, requires_grad = True)
>>> distMV = torch.distributions.MultivariateNormal (meanMV, stdMV)
>>> xMV = distMV.sample()
>>> xMV.grad_fn
>>> yMV = distMV.log_prob (torch.zeros (1))
>>> yMV.grad_fn
<SubBackward0 object at 0x00000188FA12CA58>
>>> yMV.backward()
>>> meanMV.grad
>>> stdMV.grad


K. Frank

1 Like