I have and autoencoder, and between the coder and decoder I transform the data using torch.sign . When I do it, the backpropagation of the gradient stops in that point.

If I replace torch.sign by torch.sigmoid, then I don’t have that problem and the backpropagation goes back to the beginning.

Do I have to do something different with torch.sign?

the sign function is not differentiable (or if you look at it in a differentiable manner, the gradient is 0 almost everywhere). So it is expected that you won’t be able to get gradients through it.

I see the sign function as a non-linear function and small variations of the input gives the same output, thus the derivative is like the derivative of a constant, 0 (except values near 0)

A linear approximation of the function would be a sigmoid with parameter beta \frac{1}{1+e^{-\beta x}}. Which has derivative \frac{\beta \exp{-\betax}{(1+exp{-\beta*x})^2}. If I want to create a new function of this sigmoid with beta, the parameter received in the backward function, grad_output, is the one that has to be evaluated using the formula of the derivative for each element of grad_ouptut?

If you want the “true” gradient to be used, then just implement the function you want and autograd will get the gradient for you.

If you want the backward to compute something else than the gradient of your function, you can see this doc that explains how to do that with a custom autograd Function where you will have to specify the backward to use.