Gradients are None Policy Gradient

I wish to update my PolicyNet network based on a loss from two variables not on the graph. So, any idea how I can only update PolicyNet based on a loss that does not require grad? Thanks.

actions = PolicyNet.forward(state)
loss = criterion(a,b) # a and b have requires_grad=False
loss = Variable(loss, requires_grad=True)
loss.backward()
print(list(PolicyNet.parameters())[0].grad)
optimizer.step()

Hi,

You should not call the .forward() function of Modules directly but just call them like: actions = PolicyNet(state).

If your PolicyNet’s params have requires_grad=True, then actions will also have requires_grad=True and it will propagate all the way to the loss.
So you want to make sure that the PolicyNet params do require grads and that you don’t .detach() (or .data) anything during the loss computation.

Hi thanks for responding. Within RegressNet I am having to return the loss as a Variable with requires_grad=True or else I get the “RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn”. Using this Variable loss is breaking the graph, it works when I just use the actions of the PolicyNet. I am breaking the computation graph inside RegressNet with a criterion between two variables that do not have grad. I edited my post.

Doing this is 100% wrong. You break the graph and create a new Tensor with no history. So your backward will just stop at loss.

If you properly use the actions that require grads to compute the loss. Then the loss will require gradients naturally and all will work.
You need to make sure you don’t do any op that breaks the link.

Thanks, the variables a and b I am getting the loss with are outputs of the eval network, and so when I try to call .backward() I get the error of no grad or grad_fn.

Right,
But does a and b depend on actions? If so, they should require gradients.

They depend on actions and are being input to an eval network so I do not know how to make them require gradients and pass them through an eval network at the same time.

If your eval network is just a net with requires_grad=False. It won’t present the output from requiring gradients if the input does. So if the input to the eval net requires grad, then the output will as well.
And if there is a function that computes these values based on the action, then they should require grad. Unless you detach/unpack Variable which you shouldn’t do.

1 Like