Softmax output has zero grad

I want to multiply two vectors a and b with different dimensions, and then send the product vector c into the objective function.
For example, the demo code is as follows:

import torch

a=torch.rand(2,requires_grad=True)
b=torch.rand(4,requires_grad=True)
c=torch.cat((a*b[:2], b[4:]), dim=0)

d = torch.nn.functional.softmax(c, dim=0)
d.sum().backward(retain_graph=True)
print(a.grad)
print(b.grad)

c.sum().backward()
print(a.grad)
print(b.grad)

The outputs are as follows:

tensor([0., 0.])
tensor([0., 0., 0., 0.])
tensor([0.4431, 0.1512])
tensor([0.2158, 0.2565, 1.0000, 1.0000])

I don’t know why the grad of a and b are both zero if there is a softmax function. As shown above, when I remove the softmax function, the grad of a and b are both correct.

Because d is the output of a softmax, the sum of d is always 1 by definition no matter what you input. The gradient of a constant function is 0. You could try taking c[0].backward() for something more interesting.

Best regards

Thomas

1 Like