Now, I’d like to contribute to making convolution layer twice differentiable. Is there any examples that I can follow for implementing this feature?
Another is this detailed comment:
Primarily, you need to define the backward of ConvNdBackward as ConvNd, i think.
Hi @smth, I have found current master ConvNd is using
ConvNd = torch._C._functions.ConvNd in
torch.nn.functional.py, so I can not follow the same things like : https://github.com/pytorch/pytorch/pull/1507.
Finally, I found the core function that should be modified is
torch.csrc.autograd.functions.convolution.cpp. So I have no idea how to convert the old style functions to new ones.
Did you worked out all the math of what is the backward of the backward of a convolution ?
Intuitively I would guess it corresponds to a regular conv with modified parameters?
More explicitely, if your forward is:
out = conv2d(input, weight, bias, stride, padding, dilation, groups)
Then the backward can be writen as:
gradInput = conv_transpose2d(gradOuput, weight, bias, new_stride, new_padding, new_output_paddin, new_groups, new_dilation)
Where all the
new_* parameters are computed given the parameters from the original conv.
Now you would need the following:
gradGradOutput = conv2d(gradGradInput, weight, bias, new_new_stride, new_new_padding, new_new_dilation, new_new_groups)
new_new_* parameters are a function of the
new_* parameters (and thus the original parameters).
If you have these, I can help you to add this feature.
I think a lot of the mathematics are coded up already in
in particular, the sizes should be there.
The main things to do are probably
- figure out how to optionally call forward with
_cudnn_infoinstead of all the details, and apply that to
_grad_outputinstead. (Or make that a separate class and give back the info for the second derivative? I have not actually looked at it enough to be able to get it out.)
- do the usual
In terms of arguments I think
- the derivative of output w.r.t. input is a ConvND with transpose “flipped”, stride same,
- the derivative of output w.r.t. filter is the ConvND of input with grad_output, dilation is the stride of the forward (based on the intuition that we need adjoints, maybe the other way round, too),
- the derivative of output w.r.t. bias should be something like it is for linear.
The definition of transpose seems to be so that you don’t need to reverse the filter or things like that.
So the problem with ConvNd is that all the logic is in cpp, so the task of adding support for grad of grad is purely a cpp thing. We didn’t do it for any ops right now so I can’t show you a ready PR/implementation. The code @tom pointed to is not used anymore and should be removed.
Thanks a lot, I am now working on the math operations.
I have commit a PR https://github.com/pytorch/pytorch/pull/1569. I am not sure about the correctness, so hope for your discussion.
I have problem doing tensor transpose for cuda tensors.
std::unique_ptr<Tensor> tp(weight->newTranspose(2, 3));
But when I use the code like above. It raise an RuntimeError. So how can I do transpose to the cuda tensor?
nn.ConvTransposed2d with dilation still throws
RuntimeError: unsupported ConvNd parameters
Dilated transposed convolutions are not supported by our backends (THNN and THCUNN).
I think that if you use
cudnn it would work though, but not 100% sure.