I use the output of` torch.distributions.Categorical.sample() internally in my computation graph, not as the final step. (My sampled action affects the update of a state variable.) What is the default gradient calculation that PyTorch applies in this case please? REINFORCE or Path Gradient or nothing?
Hi,
The default is nothing as far as I know, the returned samples will have requires_grad=False
.
Thanks very much. So I guess we can only learn the parameters of discrete distributions if they are at the output layer.