Hello I am trying to build a custom loss function which include using tanh.

if forward pass is y=tanhx

than how should be the backward pass?

In other word what should be the relationship between gradOutput and gradInput?

I guess you’re implementing the backward by hand? Otherwise it will work with the autograd.

The derivative for tanh is quite common and can be found here for example. The C implementation of the backward pass by pytorch is here.

@albanD Thanks for the reply. I’m a bit confused by the autograd process. The document says it computes the vector-Jacobian result, which usually returns a vector.

However, the tanh function is element-wise, that is, Y_{ij} = nn.tanh(X_{ij}), where X and Y are both matrices.

In this case, we have a matrix-valued function and the gradient of Y wrt X would be a 4-D Jacobian matrix.

The C-souce code is like this:

`*gradInput_data = *gradOutput_data * (1. - z*z);`

It seems so perform an elementwise multiplication: Y_ij * (1-X_ij^2)

Could you please explain how the auto-grad actually handle this?

P.S.: Sorry if it’s inappropriate to reply such an old post.

As you mentioned, autograd just does the vector jacobian product.

In this case, you multiply the 2d matrix of gradients wrt the output (grad_output) again a 4D Jacobian matrix for that op.

The nice thing is that the op is element-wise and so the 4D Jacibian is actually diagonal and what this vector jacobian product boils down to is multiply each element of the input 2D matrix with each element in the diagonal of the Jacobian.

Here, `(1. - z*z)`

is actually the “diagonal” of the jacobian and the result is computed by doing an element-wise product with the gradient flowing back to get the gradient wrt the input.

Thank you so much for your reply!