Gradients are zero when calling autograd on torch.bernoulli()

I was surprised to realise that it is possible to call backward() in a computational graph that invokes torch.bernoulli(). However, I am not completely sure what happens in the background.

In this small example the gradients wrt to the Bernoulli probabilities are as far as I can tell always zero.

import torch
p = torch.rand(2, 2, requires_grad=True)
loss = torch.sum(torch.bernoulli(p))
torch.allclose(p.grad, torch.zeros_like(p))

So I assume this is not invoking some form of surrogate gradient, is it? Is it intentional that the gradients are always 0 in this case?

I would also be grateful if someone could point me to the respective *.cpp file that implements the backward() call.

Many Thanks,


Yes this is expected.
It is specified here:

1 Like