Torch.sign breaks autograd dynamic graph

Arale_Mononoko · July 27, 2020, 4:19pm

Hi!

I have and autoencoder, and between the coder and decoder I transform the data using torch.sign . When I do it, the backpropagation of the gradient stops in that point.

If I replace torch.sign by torch.sigmoid, then I don’t have that problem and the backpropagation goes back to the beginning.

Do I have to do something different with torch.sign?

albanD · July 27, 2020, 4:20pm

Hi,

the sign function is not differentiable (or if you look at it in a differentiable manner, the gradient is 0 almost everywhere). So it is expected that you won’t be able to get gradients through it.

Arale_Mononoko · July 27, 2020, 7:29pm

Thanks. Thinking it over again it makes sense.

I see the sign function as a non-linear function and small variations of the input gives the same output, thus the derivative is like the derivative of a constant, 0 (except values near 0)

A linear approximation of the function would be a sigmoid with parameter beta \frac{1}{1+e^{-\beta x}}. Which has derivative \frac{\beta \exp{-\betax}{(1+exp{-\beta*x})^2}. If I want to create a new function of this sigmoid with beta, the parameter received in the backward function, grad_output, is the one that has to be evaluated using the formula of the derivative for each element of grad_ouptut?

albanD · July 27, 2020, 7:35pm

Hi,

Note that we have hardsignmoid (https://pytorch.org/docs/master/nn.functional.html#torch.nn.functional.hardsigmoid) if you’re using the nightly builds or similar functions like hardtanh.

If you want to use your custom function:

If you want the “true” gradient to be used, then just implement the function you want and autograd will get the gradient for you.
If you want the backward to compute something else than the gradient of your function, you can see this doc that explains how to do that with a custom autograd Function where you will have to specify the backward to use.

Arale_Mononoko · July 28, 2020, 8:54am

Thanks again for the answer and the references to the doc.

I think that to build a custom function is far from my knowledge of pytorch.

I think that I will multiply the data by \beta before doing the sigmoid function.

Most grateful