# Grad None when using custom loss function for a2c implementation

Hi,

I am implementing an A2C algorithm, the loss function for the critic is simply the advantage function and that works.
For the actor I use this:

``````loss = -distribution.log_prob(sample)*advantage.detach()
loss.backwards()
``````

where advantage is calculated using the critic and sample is from sampling from the distribution I am trying to learn.

``````variance_energy = 0.01
variance_decision = 0.01
mean_matrix = torch.FloatTensor([mean_energy, mean_decision])
covariance_matrix = torch.FloatTensor([[variance_energy, 0],[0, variance_decision]])
distribution = MultivariateNormal(mean_matrix, covariance_matrix)
sample = distribution.sample()
``````

Here, mean_energy and mean_decision are the output of my NN and are restricted to positive values (so I donâ€™t have problems with the log)
I have already checked that required_grad = True and is_leaf = True for the loss function also.

I calculate the gradients of the weights manually (so I can update them that way) after calling loss.backward() with:

`````` for p in self.model.parameters():
``````

Does anybody have an idea how to troubleshoot this? Thanks in advance!

Hi Codeflux!

`advantage.detach()` â€śbreaks the computation graph,â€ť and
`loss.requires_grad = True` doesnâ€™t repair the damage.

Regardless of whether your code can be tweaked to (appear to)
work, `loss` must be usefully differentiable with respect to the
`parameters` you are trying to train in order for backpropagation
and training to work.

Best.

K. Frank

Thanks for the reply! Turns out I donâ€™t need loss.required_grad = True and the .detach() call on the advantage function for the loss function to have grad enabled.
But now I receive the error message:

RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.

Seems like backpropagation goes through my network two times?

Iâ€™ve also tried something else, removing the advantage function (which is ofc not correct for actor-critic). Then for some reason the loss function does not have grad_enabled, and using grad_enabled = True results in the gradient of my weights being zero again.
Is_Leaf is true again though.

I think here is already something wrong.

It appears that .backward does not seem to work with the MultivariateNormal functionâ€¦ when I use Normal instead for just one parameter, I can calculate the gradient.

Can anyone confirm if this is true?

Hi Codeflux!

I believe that `MultivariateNormal` and `Normal` have the same
behavior in this regard.

In general, you canâ€™t differentiate or back propagate through calling
`.sample()` on a `Distribution`. This is true for `Normal`, as well as
`MultivariateNormal`. In contrast, you typically can backpropagate
through non-sampling methods such as `.log_prob()`.

Here are some illustrative results:

``````>>> import torch
>>> torch.__version__
'1.9.0'
>>>
>>> meanN = torch.zeros (1, requires_grad = True)
>>> stdN = torch.ones (1, requires_grad = True)
>>> distN = torch.distributions.Normal (meanN, stdN)
>>>
>>> xN = distN.sample()
>>>
>>> yN = distN.log_prob (torch.zeros (1))
<SubBackward0 object at 0x00000188F9459CF8>
>>> yN.backward()
tensor([0.])
tensor([-1.])
>>>
>>> meanMV = torch.zeros (1, requires_grad = True)
>>> stdMV = torch.eye (1, requires_grad = True)
>>> distMV = torch.distributions.MultivariateNormal (meanMV, stdMV)
>>>
>>> xMV = distMV.sample()
>>>
>>> yMV = distMV.log_prob (torch.zeros (1))
<SubBackward0 object at 0x00000188FA12CA58>
>>> yMV.backward()
tensor([0.])
tensor([[-0.5000]])
``````

Best.

K. Frank

1 Like