Implementing a custom convolution using conv2d_input and conv2d_weight

Mostafa_Elhoushi · April 16, 2020, 1:45pm

@hanspinckaers: Thank you following up. it has been a while since I did the benchmark. I recall I was training ResNet18 on Imagenet. Using Pytorch’s torch.nn.grad.conv2d_input(...) and torch.nn.grad.conv2d_weight(...) was probably twice as slow and using twice as much memory than letting PyTorch derive the backward pass of Conv2d automatically.
When I tried to use the method you provided in this link, made things a bit faster, but still much slower than PyTorch’s automatic backward pass.

@fsds: Thanks for your answer. Are you referring to td::tuple<at::Tensor,at::Tensor> cudnn_convolution_backward(...) in:

github.com

pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/Conv.cpp#L1064


    bool benchmark, bool deterministic)
{
  TensorArg grad_output{ grad_output_t, "grad_output", 1 },
            weight{ weight_t, "weight", 2 };
  return cudnn_convolution_backward_input(
      "cudnn_convolution_backward_input",
      input_size, grad_output, weight,
      padding, stride, dilation, groups, benchmark, deterministic);
}


std::tuple<at::Tensor,at::Tensor> cudnn_convolution_backward(
    const at::Tensor& input, const at::Tensor& grad_output_t, const at::Tensor& weight,
    IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups,
    bool benchmark, bool deterministic, std::array<bool,2> output_mask) {


  Tensor grad_output = grad_output_t.contiguous(input.suggest_memory_format());


  Tensor grad_input, grad_weight;
  if (input.numel() == 0) {
    if (output_mask[0]) {
      grad_input = at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);

So I just need to create a Python wrapper to it and invoke it in our backward pass?