# Is there any soft way of counting positive values with grad reserved?

Hi,

I need a differentiable metric to roughly reflect the level of positive values in a tensor, so literally counting is not necessary.

I tried two hard operations:

1. relu+torch.sign+sum
2. torch.count_nonzero

Result:

I’m not sure if fc layer+relu+soft sign+sum is a potential option.

Any suggestions?

Thank you!

Hi Ximeng!

`sigmoid (x)`, which is differentiable, moves “softly” from `0` to `1` as `x`
moves from negative to positive.

You may shift where the transition occurs, `sigmoid (x - shift)`, and
sharpen the transition, `sigmoid (sharpness * x)`.

Consider:

``````>>> import torch
>>> torch.__version__
'1.12.0'
>>> _ = torch.manual_seed (2022)
>>> x = torch.randn (5, 8, requires_grad = True)
>>> x
tensor([[-0.9788, -1.5154, -0.8222,  0.1214,  0.0716, -0.0872, -0.0253, -1.6267],
[ 0.2230, -1.6746, -1.4725,  0.9721, -0.2191, -0.9397, -1.7756, -0.6259],
[-1.1104,  1.1890,  1.3730,  0.4915,  0.3579, -0.1685, -0.8579, -1.0574],
[ 0.2105,  1.9045,  1.8237,  1.5122, -0.3140, -0.0810, -1.3631, -0.0701],
[-1.1876, -1.0787,  0.9551, -0.2958,  1.0663, -0.5134, -0.3846, -1.1481]],
>>> (x > 0).sum()
tensor(14)
>>> torch.sigmoid (x)
tensor([[0.2731, 0.1801, 0.3053, 0.5303, 0.5179, 0.4782, 0.4937, 0.1643],
[0.5555, 0.1578, 0.1866, 0.7255, 0.4454, 0.2810, 0.1448, 0.3484],
[0.2478, 0.7666, 0.7979, 0.6204, 0.5885, 0.4580, 0.2978, 0.2578],
[0.5524, 0.8704, 0.8610, 0.8194, 0.4221, 0.4798, 0.2037, 0.4825],
[0.2337, 0.2538, 0.7221, 0.4266, 0.7439, 0.3744, 0.4050, 0.2408]],
>>> torch.sigmoid (x).sum()
>>> torch.sigmoid (10 * x).sum()
>>> torch.sigmoid (100 * x).sum()
``````

Best.

K. Frank

Thanks Frank. But I am not sure if the sigmoid manner will possibly cause a gradient vanishment problem.

d = torch.sigmoid (1000 * c).sum()
d.backward()
tensor([0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
1.6938e-22, 0.0000e+00, 0.0000e+00, 0.0000e+00])

So I tried another possible solution which might slightly mitigate the problem.
b = torch.relu(a/(1e-3 + torch.abs(a))).sum()
b.backward()
tensor([0.0000, 0.0002, 0.0003, 0.0027, 0.0000, 0.0000, 0.0000, 0.0000, 0.0012,
0.0029])

BTW, I noticed that gradients are also tiny for a normal softmax operation.
b = torch.softmax(a, dim=0) .sum()
b.backward()
tensor([7.9883e-09, 1.6591e-08, 4.4367e-09, 1.7502e-08, 1.7472e-08, 9.1981e-09,
2.4514e-09, 4.3570e-08])
Obviously, softmax should always work(eg: in self-attention block) even if Float32 has about 7 significant digits if I remember correctly. So I guess there must be some misunderstanding about the gradient above

Hi Ximeng!

`1000` is very large for the multiplier used to “sharpen” the `sigmoid()`.
This causes the `sigmoid()` to become quite close to a (discontinuous)
step function for which the gradients would be exactly zero. Very small
gradients (that underflow to zero) are to be expected here.

If you want “soft” counting, your “soft count” will be a floating-point number
that only approximates your actual count (and you can get useful gradients).

If you want your “soft count” to very closely approximate the true count,
its gradients will become very close to zero. That’s the unavoidable

The zero gradients (up to round-off error) are due to the fact that you
used `sum()` to reduce the result of `softmax()` to a scalar on which you
could call `.backward()`. By definition, the `sum()` of `softmax()` is exactly
one (up to round-off error), which is a constant, so the gradient of
`softmax().sum()` is indeed zero.

You could try, for comparison, `b = torch.softmax(a, dim=0).exp().sum()`
and you will see that you get non-zero gradients.

Best.

K. Frank

1 Like

Oh, I see. Thank you very much!