How to modify ConvNdBackward to make nn.ConvNd be twice differentiable

Now, I’d like to contribute to making convolution layer twice differentiable. Is there any examples that I can follow for implementing this feature?

Hi Marvin,

One pointer is just other PRs converting old style functions to new:

Another is this detailed comment:

Primarily, you need to define the backward of ConvNdBackward as ConvNd, i think.

1 Like

Hi @smth, I have found current master ConvNd is using ConvNd = torch._C._functions.ConvNd in, so I can not follow the same things like :

Finally, I found the core function that should be modified is ConvBackward::apply in torch.csrc.autograd.functions.convolution.cpp. So I have no idea how to convert the old style functions to new ones.


Did you worked out all the math of what is the backward of the backward of a convolution ?
Intuitively I would guess it corresponds to a regular conv with modified parameters?

More explicitely, if your forward is:
out = conv2d(input, weight, bias, stride, padding, dilation, groups)

Then the backward can be writen as:
gradInput = conv_transpose2d(gradOuput, weight, bias, new_stride, new_padding, new_output_paddin, new_groups, new_dilation)
Where all the new_* parameters are computed given the parameters from the original conv.

Now you would need the following:
gradGradOutput = conv2d(gradGradInput, weight, bias, new_new_stride, new_new_padding, new_new_dilation, new_new_groups)
where the new_new_* parameters are a function of the new_* parameters (and thus the original parameters).

If you have these, I can help you to add this feature.

Hi Marvin,

I think a lot of the mathematics are coded up already in
in particular, the sizes should be there.

The main things to do are probably

  • figure out how to optionally call forward with _cudnn_info instead of all the details, and apply that to grad_output in _grad_input and _grad_output instead. (Or make that a separate class and give back the info for the second derivative? I have not actually looked at it enough to be able to get it out.)
  • do the usual @staticmethod and selfctx business.

In terms of arguments I think

  • the derivative of output w.r.t. input is a ConvND with transpose “flipped”, stride same,
  • the derivative of output w.r.t. filter is the ConvND of input with grad_output, dilation is the stride of the forward (based on the intuition that we need adjoints, maybe the other way round, too),
  • the derivative of output w.r.t. bias should be something like it is for linear.

The definition of transpose seems to be so that you don’t need to reverse the filter or things like that.

Best regards


So the problem with ConvNd is that all the logic is in cpp, so the task of adding support for grad of grad is purely a cpp thing. We didn’t do it for any ops right now so I can’t show you a ready PR/implementation. The code @tom pointed to is not used anymore and should be removed.

1 Like

Thanks a lot, I am now working on the math operations.

1 Like

Hi all,

I have commit a PR I am not sure about the correctness, so hope for your discussion.


Hi, all

I have problem doing tensor transpose for cuda tensors.

std::unique_ptr<Tensor> tp(weight->newTranspose(2, 3));

But when I use the code like above. It raise an RuntimeError. So how can I do transpose to the cuda tensor?

1 Like

Using nn.ConvTransposed2d with dilation still throws RuntimeError: unsupported ConvNd parameters


Dilated transposed convolutions are not supported by our backends (THNN and THCUNN).
I think that if you use cudnn it would work though, but not 100% sure.