Hi,
Today, when I tried to replicate the ddpg experiment, I found a proble that puzzled me.
Let suppose “actor” and “critic” are the networks, and “state” represents the input tensor.
when I use
actions = actor(state)
var_actions = torch.tensor(actions.data, requires_grad=True)
q = critic(state, var_actions)
q.backward(torch.ones(q.size())
after q.backward(), I can find var_actions.grad become a tensor(which is the grad of var_actions), but when I use
actions = actor(state)
q = critic(state, actions)
q.backward(torch.ones(q.size())
after q.backward(), when I check the grad of “actions”, the value is None
I can’t find the problem, it seems the only diference is whether to create a new tensor?