The derivative is implemented here as:
self: grad * self.sgn()
and with:
torch.tensor(0.).sgn()
# tensor(0.)
torch.tensor(1.).sgn()
# tensor(1.)
torch.tensor(-1.).sgn()
# tensor(-1.)
would then return a zero.
I don’t know how it was defined, but based on e.g. this answer the zero output might be “convenient” for users.