Now, I’d like to contribute to making convolution layer twice differentiable. Is there any examples that I can follow for implementing this feature?

Hi Marvin,

One pointer is just other PRs converting old style functions to new: https://github.com/pytorch/pytorch/pull/1507

https://github.com/pytorch/pytorch/pull/1426

Another is this detailed comment:

Primarily, you need to define the backward of ConvNdBackward as ConvNd, i think.

Hi @smth, I have found current master ConvNd is using `ConvNd = torch._C._functions.ConvNd`

in `torch.nn.functional.py`

, so I can not follow the same things like : https://github.com/pytorch/pytorch/pull/1507.

Finally, I found the core function that should be modified is `ConvBackward::apply`

in `torch.csrc.autograd.functions.convolution.cpp`

. So I have no idea how to convert the old style functions to new ones.

Hi,

Did you worked out all the math of what is the backward of the backward of a convolution ?

Intuitively I would guess it corresponds to a regular conv with modified parameters?

More explicitely, if your forward is:

`out = conv2d(input, weight, bias, stride, padding, dilation, groups)`

Then the backward can be writen as:

`gradInput = conv_transpose2d(gradOuput, weight, bias, new_stride, new_padding, new_output_paddin, new_groups, new_dilation)`

Where all the `new_*`

parameters are computed given the parameters from the original conv.

Now you would need the following:

`gradGradOutput = conv2d(gradGradInput, weight, bias, new_new_stride, new_new_padding, new_new_dilation, new_new_groups)`

where the `new_new_*`

parameters are a function of the `new_*`

parameters (and thus the original parameters).

If you have these, I can help you to add this feature.

Hi Marvin,

I think a lot of the mathematics are coded up already in

https://github.com/pytorch/pytorch/blob/master/torch/nn/_functions/conv.py

in particular, the sizes should be there.

The main things to do are probably

- figure out how to optionally call forward with
`_cudnn_info`

instead of all the details, and apply that to`grad_output`

in`_grad_input`

and`_grad_output`

instead. (Or make that a separate class and give back the info for the second derivative? I have not actually looked at it enough to be able to get it out.) - do the usual
`@staticmethod`

and`self`

→`ctx`

business.

In terms of arguments I think

- the derivative of output w.r.t. input is a ConvND with transpose “flipped”, stride same,
- the derivative of output w.r.t. filter is the ConvND of input with grad_output, dilation is the stride of the forward (based on the intuition that we need adjoints, maybe the other way round, too),
- the derivative of output w.r.t. bias should be something like it is for linear.

The definition of transpose seems to be so that you don’t need to reverse the filter or things like that.

Best regards

Thomas

So the problem with ConvNd is that all the logic is in cpp, so the task of adding support for grad of grad is purely a cpp thing. We didn’t do it for any ops right now so I can’t show you a ready PR/implementation. The code @tom pointed to is not used anymore and should be removed.

Thanks a lot, I am now working on the math operations.

Hi all,

I have commit a PR https://github.com/pytorch/pytorch/pull/1569. I am not sure about the correctness, so hope for your discussion.

Hi, all

I have problem doing tensor transpose for cuda tensors.

```
std::unique_ptr<Tensor> tp(weight->newTranspose(2, 3));
```

But when I use the code like above. It raise an RuntimeError. So how can I do transpose to the cuda tensor?

Using `nn.ConvTransposed2d`

with dilation still throws `RuntimeError: unsupported ConvNd parameters`

Hi,

Dilated transposed convolutions are not supported by our backends (THNN and THCUNN).

I think that if you use `cudnn`

it would work though, but not 100% sure.