Differentiable Sign or Step Like Function

HaydenW · September 22, 2020, 1:53pm

Hi, I’m very new to PyTorch and I have been trying to extend an autograd function that tunes multiple thresholds to return a binary output and optimize using BCELoss, but I’ve been struggling with the fact that any sign or step function I apply always returns a gradient of 0. In some instances I’ve been able to get it to work with ReLu and Trigonometric functions; however, it then returns the exact same (very small) gradient for every threshold. The framework of the sign function follows this idea: Output = Sign(Sum(Sign(Sign(X - Threshold)*-1 + 1))). If I wanted to apply an Or function with multiple thresholds, I would then apply this function Sign(Output1 + Output2). I would like to define the optimal threshold cut-off point for separate arrays and then combine them through And/Or logic. Is there any operation in PyTorch that works in autograd to do so?

albanD · September 22, 2020, 2:08pm

Hi,

The problem with the sign function is that it return 1 or -1 and so is piecewise constant. So the gradient of the function almost everywhere is 0. So it is expected that you get a gradient of 0 if you use that function.

In general, if your loss is piecewise constant, you won’t be able to optimize it with gradient-based optimizers as all gradients are always 0 I’m afraid.

HaydenW · September 22, 2020, 2:50pm

That makes sense why I’ve always returned a gradient of 0, I’ve done some reading on Binarized Neural Networks. They overcome this issue through a quantized sign function and a hard tanh gradient. I’ve looked through various GitHub implementations of the suggested method; however, I can’t seem to figure out how to apply this to my threshold problem without returning a 0 gradient again. Is this a possible solution to my 0 gradient problem in autograd? Or should i try to approximate a probability using the hyperbolic tangent atan, which mostly returns a gradient larger than 0?

albanD · September 22, 2020, 3:40pm

Hi,

You can either design a smoothed version of your loss function that will have non zero gradients.

Or you can use this piecewise constant function but design a backward to compute something that is not the true gradient but will point in the right direction using a custom autograd.Function.