Hi, guys,

I am confused by the gradient of inplace version ReLU until I found this thread. However, it is still unclear how the backward performs in inplace ReLU.

Base on pytorch/derivatives.yaml at master · pytorch/pytorch · GitHub

name: relu_(Tensor(a!) self) → Tensor(a!)

self: threshold_backward(grad, result, 0)

we can see the inplace ReLU (relu_) replies on the output tensor (result) for its back-propogation, however, this tensor has already been modified to non-negative range. It is confusing how to passing back gradient as we lost the sign of the tensor.

See example below, for inplace ReLU, the saved_tensors in the backward function is non-negative

class MyReLU(torch.autograd.Function):

“”"

We can implement our own custom autograd Functions by subclassing

torch.autograd.Function and implementing the forward and backward passes

which operate on Tensors.

“”"

@staticmethod

def forward(ctx, input):

“”"

In the forward pass we receive a Tensor containing the input and return

a Tensor containing the output. ctx is a context object that can be used

to stash information for backward computation. You can cache arbitrary

objects for use in the backward pass using the ctx.save_for_backward method.

“”"

ctx.save_for_backward(input)

return input.clamp(min=0)

@staticmethod

def backward(ctx, grad_output):

“”"

In the backward pass we receive a Tensor containing the gradient of the loss

with respect to the output, and we need to compute the gradient of the loss

with respect to the input.

“”"

input, = ctx.saved_tensors

grad_input = grad_output.clone()

grad_input[input < 0] = 0 # *for inplace version, grad_input = grad_output, as input is modified into non-negative range?*

return grad_input

Thus, the only way for correct inplace ReLU backward propogration is to save another tensor which indicate the sign of the input (flag = input < 0)?

In other word, for inplace ReLU, a flag (e. g. flag = input < 0) tensor is saved for backward, rather than the input/output tensor?