Gradients are zero when calling autograd on torch.bernoulli()

I was surprised to realise that it is possible to call backward() in a computational graph that invokes torch.bernoulli(). However, I am not completely sure what happens in the background.

In this small example the gradients wrt to the Bernoulli probabilities are as far as I can tell always zero.

import torch
p = torch.rand(2, 2, requires_grad=True)
loss = torch.sum(torch.bernoulli(p))
loss.backward()
torch.allclose(p.grad, torch.zeros_like(p))

So I assume this is not invoking some form of surrogate gradient, is it? Is it intentional that the gradients are always 0 in this case?

I would also be grateful if someone could point me to the respective *.cpp file that implements the backward() call.

Many Thanks,
Simon

Hi,

Yes this is expected.
It is specified here: https://github.com/pytorch/pytorch/blob/727463a727e75858809a325477ac2b62ccd08e7e/tools/autograd/derivatives.yaml#L270-L271

1 Like