Will relu inplace affect previous CONV layer on backward pass

yzz · July 24, 2020, 4:52pm

I saw torch vision use nn.ReLU(inplace=True) after a CONV layer (https://github.com/pytorch/vision/blob/1aef87d01eec2c0989458387fa04baebcc86ea7b/torchvision/models/vgg.py#L74). However, based on the description for here (Why relu(inplace=True) does not give error in official resnet.py but it gives error in my code?). I think the output of the CONV layer is modified. I wonder if this affects the backward propagation on the CONV layer. Please feel free to correct me, if I am wrong.

Thanks.

albanD · July 24, 2020, 5:27pm

Hi,

It does, but the output of the convolution is not needed for backward. So it’s not an issue.
In general, you will get an error if something that is needed is modified.

yzz · July 24, 2020, 6:50pm

Thanks for the reply. However, based one of your previous rely from other post (Why relu(inplace=True) does not give error in official resnet.py but it gives error in my code?), it says that the conv operation need it’s output to be able to compute the backward pass. Not sure if I missed anything.

Thank you again!

albanD · July 24, 2020, 8:15pm

Ho right sorry, I read it the wrong way.
And so do you get an error when you use it?

yzz · July 24, 2020, 8:29pm

I didn’t get an error, but I wonder why there is no error notification since the output is changed?

albanD · July 24, 2020, 9:00pm

Actually looking at the formulas we have today again, we don’t need the output of the conv to compute the gradient. So this is expected to work.
It is possible that we were doing things differently before when that other thread you linked was created.

yzz · July 24, 2020, 9:45pm

Thanks for the prompt reply. If CONV layer does not need the output for its backward pass, I wonder how its gradients are calculated. In addition, how do I know which layer needs the output for its backward pass (so we can have in place update). I didn’t find any related info from Pytorch document, it would be greatly appreciated if some reference could be shared.

gouthamvgk · July 25, 2020, 12:23pm

Ultimately we need only the gradients of weights and inputs of a conv layer. So the gradient computation is slightly changed to do it directly with relu outputs.

. Above image shows it for filter gradients, we can do similar steps for input gradients

albanD · July 27, 2020, 1:50pm

Convolution is a linear operation. So you can see it as (ignoring conv parameters): y = conv(x ,w).
Then dL/dx = dL/dy dy/dx = conv_transpose(dL/dy, w) and dL/dw = dL/dy dy/dw = conv(x, dL/dy). These new conv with modified parameters.

The way I check if the output is needed is by checking this file. That contains most of the derivative definitions.
There you can see that the various definition for convolution depend on the input/weight but never on result which is the output of the forward.