Now, I’d like to contribute to making convolution layer twice differentiable. Is there any examples that I can follow for implementing this feature?
Hi Marvin,
One pointer is just other PRs converting old style functions to new: https://github.com/pytorch/pytorch/pull/1507
https://github.com/pytorch/pytorch/pull/1426
Another is this detailed comment:
Primarily, you need to define the backward of ConvNdBackward as ConvNd, i think.
Hi @smth, I have found current master ConvNd is using ConvNd = torch._C._functions.ConvNd
in torch.nn.functional.py
, so I can not follow the same things like : https://github.com/pytorch/pytorch/pull/1507.
Finally, I found the core function that should be modified is ConvBackward::apply
in torch.csrc.autograd.functions.convolution.cpp
. So I have no idea how to convert the old style functions to new ones.
Hi,
Did you worked out all the math of what is the backward of the backward of a convolution ?
Intuitively I would guess it corresponds to a regular conv with modified parameters?
More explicitely, if your forward is:
out = conv2d(input, weight, bias, stride, padding, dilation, groups)
Then the backward can be writen as:
gradInput = conv_transpose2d(gradOuput, weight, bias, new_stride, new_padding, new_output_paddin, new_groups, new_dilation)
Where all the new_*
parameters are computed given the parameters from the original conv.
Now you would need the following:
gradGradOutput = conv2d(gradGradInput, weight, bias, new_new_stride, new_new_padding, new_new_dilation, new_new_groups)
where the new_new_*
parameters are a function of the new_*
parameters (and thus the original parameters).
If you have these, I can help you to add this feature.
Hi Marvin,
I think a lot of the mathematics are coded up already in
https://github.com/pytorch/pytorch/blob/master/torch/nn/_functions/conv.py
in particular, the sizes should be there.
The main things to do are probably
- figure out how to optionally call forward with
_cudnn_info
instead of all the details, and apply that tograd_output
in_grad_input
and_grad_output
instead. (Or make that a separate class and give back the info for the second derivative? I have not actually looked at it enough to be able to get it out.) - do the usual
@staticmethod
andself
→ctx
business.
In terms of arguments I think
- the derivative of output w.r.t. input is a ConvND with transpose “flipped”, stride same,
- the derivative of output w.r.t. filter is the ConvND of input with grad_output, dilation is the stride of the forward (based on the intuition that we need adjoints, maybe the other way round, too),
- the derivative of output w.r.t. bias should be something like it is for linear.
The definition of transpose seems to be so that you don’t need to reverse the filter or things like that.
Best regards
Thomas
So the problem with ConvNd is that all the logic is in cpp, so the task of adding support for grad of grad is purely a cpp thing. We didn’t do it for any ops right now so I can’t show you a ready PR/implementation. The code @tom pointed to is not used anymore and should be removed.
Thanks a lot, I am now working on the math operations.
Hi all,
I have commit a PR https://github.com/pytorch/pytorch/pull/1569. I am not sure about the correctness, so hope for your discussion.
Hi, all
I have problem doing tensor transpose for cuda tensors.
std::unique_ptr<Tensor> tp(weight->newTranspose(2, 3));
But when I use the code like above. It raise an RuntimeError. So how can I do transpose to the cuda tensor?
Using nn.ConvTransposed2d
with dilation still throws RuntimeError: unsupported ConvNd parameters
Hi,
Dilated transposed convolutions are not supported by our backends (THNN and THCUNN).
I think that if you use cudnn
it would work though, but not 100% sure.