Hi,
Today, when I tried to replicate the ddpg experiment, I found a proble that puzzled me.
Let suppose “actor” and “critic” are the networks, and “state” represents the input tensor.
when I use

actions = actor(state)
q = critic(state, var_actions)
q.backward(torch.ones(q.size())

after q.backward(), I can find var_actions.grad become a tensor(which is the grad of var_actions), but when I use

actions = actor(state)
q = critic(state, actions)
q.backward(torch.ones(q.size())

after q.backward(), when I check the grad of “actions”, the value is None

I can’t find the problem, it seems the only diference is whether to create a new tensor?

PyTorch creates a dynamic computational graph (like a tree) when calculating the gradients.
The leaves of this tree are input tensors. For leaves, you can only get the gradients if I am right.

Gradients are calculated by tracing the graph from the root (output tensor) to the leaf and multiplying every gradient in the way using the chain rule .

I created several examples in here.

``````var_actions = torch.tensor(actions.data, requires_grad=True)
will be a leaf so this is why you are getting the gradients. In non leaf case, you will get `None`, as you confirmed.