I saw torch vision use nn.ReLU(inplace=True) after a CONV layer (https://github.com/pytorch/vision/blob/1aef87d01eec2c0989458387fa04baebcc86ea7b/torchvision/models/vgg.py#L74). However, based on the description for here (Why relu(inplace=True) does not give error in official resnet.py but it gives error in my code?). I think the output of the CONV layer is modified. I wonder if this affects the backward propagation on the CONV layer. Please feel free to correct me, if I am wrong.
It does, but the output of the convolution is not needed for backward. So it’s not an issue.
In general, you will get an error if something that is needed is modified.
Thanks for the reply. However, based one of your previous rely from other post (Why relu(inplace=True) does not give error in official resnet.py but it gives error in my code?), it says that the conv operation need it’s output to be able to compute the backward pass. Not sure if I missed anything.
Thank you again!
Ho right sorry, I read it the wrong way.
And so do you get an error when you use it?
I didn’t get an error, but I wonder why there is no error notification since the output is changed?
Actually looking at the formulas we have today again, we don’t need the output of the conv to compute the gradient. So this is expected to work.
It is possible that we were doing things differently before when that other thread you linked was created.
Thanks for the prompt reply. If CONV layer does not need the output for its backward pass, I wonder how its gradients are calculated. In addition, how do I know which layer needs the output for its backward pass (so we can have in place update). I didn’t find any related info from Pytorch document, it would be greatly appreciated if some reference could be shared.
Ultimately we need only the gradients of weights and inputs of a conv layer. So the gradient computation is slightly changed to do it directly with relu outputs.
. Above image shows it for filter gradients, we can do similar steps for input gradients
Convolution is a linear operation. So you can see it as (ignoring conv parameters):
y = conv(x ,w).
dL/dx = dL/dy dy/dx = conv_transpose(dL/dy, w) and
dL/dw = dL/dy dy/dw = conv(x, dL/dy). These new conv with modified parameters.
The way I check if the output is needed is by checking this file. That contains most of the derivative definitions.
There you can see that the various definition for convolution depend on the input/weight but never on
result which is the output of the forward.