Does method have gradient?

I have three weight maps that’s output from a CONV layer. They are of the same size [B, C, H, W]

My goal is to employ a sparsemax(variant of softmax) normalization on every pixel of the three weight maps so that after normalization their values would sum to one and have a sparse distribution. But my implementation above doesn’t seem to work. So I wonder if method have gradient? Or is it something else that leads to the mistake?

Hi – does pass gradients, as you can verify with the following:

import torch

a = torch.ones(2, requires_grad=True)
b = torch.ones(3, requires_grad=True)
c =, b))
output = c.sum()
print(a.grad) #tensor([1., 1.])
print(b.grad) #tensor([1., 1., 1.])

Could you be more specific about your problem?