I was surprised to realise that it is possible to call
backward() in a computational graph that invokes torch.bernoulli(). However, I am not completely sure what happens in the background.
In this small example the gradients wrt to the Bernoulli probabilities are as far as I can tell always zero.
import torch p = torch.rand(2, 2, requires_grad=True) loss = torch.sum(torch.bernoulli(p)) loss.backward() torch.allclose(p.grad, torch.zeros_like(p))
So I assume this is not invoking some form of surrogate gradient, is it? Is it intentional that the gradients are always 0 in this case?
I would also be grateful if someone could point me to the respective
*.cpp file that implements the