I am currently fiddling around with a PPO implementation where the loss depends on the entropy of a multivariate normal distribution. The entropy is calculated as follows
and the loss also depends on the return value “dist_entropy”. However during the backward pass the parameter “self.action_std” does not change at all. Any clue about the reason?
a, b, and c make sense, but what happens to d? From my basic pytorch understanding (and the help of stackoverflow) a and b are straightforward. c would be copied to cpu but as it is already stored there, the .to(‘cpu’) has no effect.
Now to d: My understanding until now was, that torch.ones([1]) is a “normal” pytorch tensor that becomes a parameter (and thus differentiable) and gets moved to the GPU by the to('cuda') in the last step. Finally, d points to the address on the GPU. But the print output indicates that something different is going on… Btw, what does grad_fn=<CopyBackwards> mean?
Actually the special case is c here.
a and b are Parameters because you created them.
d is just a Tensor with a grad_fn: it is not a leaf anymore because it was created by a differentiable op
c is the special case because the Tensor is already on the cpu, then .to('cpu') is doing nothing and you still get the Parameter back.