Help with derivatives inside loss function

Check the outputs of your model and make sure they were not set to zero by the F.relu.
Here is a small example showing how the init could create the expected zero gradients:

# negative init
x = torch.randn(4, 4) - 10.
x.requires_grad_()

y = torch.diagonal(x)
z = F.relu(y)
z.mean().backward()
x.grad
# tensor([[0., 0., 0., 0.],
#         [0., 0., 0., 0.],
#         [0., 0., 0., 0.],
#         [0., 0., 0., 0.]])

# positive init
x = torch.randn(4, 4) + 10.
x.requires_grad_()

y = torch.diagonal(x)
z = F.relu(y)
z.mean().backward()
x.grad
# tensor([[0.2500, 0.0000, 0.0000, 0.0000],
#         [0.0000, 0.2500, 0.0000, 0.0000],
#         [0.0000, 0.0000, 0.2500, 0.0000],
#         [0.0000, 0.0000, 0.0000, 0.2500]])

It’s not using your full model, but just the last operations instead.

1 Like