I was surprised to realise that it is possible to call backward()
in a computational graph that invokes torch.bernoulli(). However, I am not completely sure what happens in the background.
In this small example the gradients wrt to the Bernoulli probabilities are as far as I can tell always zero.
import torch
p = torch.rand(2, 2, requires_grad=True)
loss = torch.sum(torch.bernoulli(p))
loss.backward()
torch.allclose(p.grad, torch.zeros_like(p))
So I assume this is not invoking some form of surrogate gradient, is it? Is it intentional that the gradients are always 0 in this case?
I would also be grateful if someone could point me to the respective *.cpp
file that implements the backward()
call.
Many Thanks,
Simon