It does, but the output of the convolution is not needed for backward. So it’s not an issue.
In general, you will get an error if something that is needed is modified.
Actually looking at the formulas we have today again, we don’t need the output of the conv to compute the gradient. So this is expected to work.
It is possible that we were doing things differently before when that other thread you linked was created.
Thanks for the prompt reply. If CONV layer does not need the output for its backward pass, I wonder how its gradients are calculated. In addition, how do I know which layer needs the output for its backward pass (so we can have in place update). I didn’t find any related info from Pytorch document, it would be greatly appreciated if some reference could be shared.
Ultimately we need only the gradients of weights and inputs of a conv layer. So the gradient computation is slightly changed to do it directly with relu outputs.
Convolution is a linear operation. So you can see it as (ignoring conv parameters): y = conv(x ,w).
Then dL/dx = dL/dy dy/dx = conv_transpose(dL/dy, w) and dL/dw = dL/dy dy/dw = conv(x, dL/dy). These new conv with modified parameters.
The way I check if the output is needed is by checking this file. That contains most of the derivative definitions.
There you can see that the various definition for convolution depend on the input/weight but never on result which is the output of the forward.