Implementing a custom convolution using conv2d_input and conv2d_weight

fsds · May 23, 2018, 8:27am

Hi,
I have been trying to implement a custom convolutional layer.
In order to do that, I’m using torch.nn.functional.conv2d in the forward pass, and both torch.nn.grad.conv2d_weight and torch.nn.grad.conv2d_input in the backward pass.
I started getting OOM exceptions when entering torch.nn.grad.conv2d_weight.

My question is, what exactly is the difference between using:

torch.nn.functional.conv2d(x, w)

and

MyConv().apply(x, w)

when MyConv is implemented as follows:

class MyConv(Function):
   @staticmethod
   def forward(ctx, x, w):
     ctx.save_for_backward(x, w)
     return F.conv2d(x, w)

  @staticmethod
  def backward(ctx, grad_output):
    x, w = ctx.saved_variables
    x_grad = w_grad = None
    if ctx.needs_input_grad[0]:
      x_grad = torch.nn.grad.conv2d_input(x.shape, w, grad_output)
    if ctx.needs_input_grad[1]:
      w_grad = torch.nn.grad.conv2d_weight(x, w.shape, grad_output)
    return x_grad, w_grad

Why would torch.nn.grad.conv2d_weight return an OOM exception when torch.nn.functional.conv2d (that I assume also uses torch.nn.grad.conv2d_weight in the backward pass) did not?

Thanks.

Tony_Lee · June 4, 2018, 10:31am

hi , have you solved your problem ? I want to define a conv2d layer too, can you share me you code?

fsds · June 4, 2018, 11:12am

Yes.
I’ve avoided this by directly calling cudnn_convolution_backward_input and cudnn_convolution_backward_weight (by following this example https://github.com/pytorch/extension-cpp, and adding two c++ functions that call the cudnn functions) instead of using torch.nn.grad.conv2d_input and torch.nn.grad.conv2d_weight.
You might not have that problem though (depends on your nn and GPU model). You should first check if the if torch.nn.grad.conv2d_weight and torch.nn.grad.conv2d_input are working for your model without returning an out of memory exception.

Tony_Lee · June 4, 2018, 11:34am

I use your code above to modify my code, but I got an error.
Is F.conv2d(x, w) have both forward and backward methords?

Tony_Lee · June 4, 2018, 11:37am

I just want to custom a similar conv2d layer as the API
Can I just modify the pytorch python interface?

fsds · June 4, 2018, 2:14pm

What exactly is the error?

Tony_Lee · June 5, 2018, 1:07am

it shows as follows

File “/home/lth/anaconda3/lib/python3.5/site-packages/torch/nn/functional.py”, line 90, in conv2d
return f(input, weight, bias)

TypeError: argument 0 is not a Variable

my code is like that:
class Conv2d(_ConvNd):

def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True):
    kernel_size = _pair(kernel_size)
    stride = _pair(stride)
    padding = _pair(padding)
    dilation = _pair(dilation)
    super(Conv2d, self).__init__(in_channels, out_channels, kernel_size, stride, padding, dilation, False, _pair(0), groups, bias)

def forward(self, input):       
    return conv2d(input, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)

conv2d = Conv2dF.apply

class Conv2dF(Function):

@staticmethod
def forward(cxt, input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
        
    cxt.save_for_backward(input, weight, bias)

    return F.conv2d(input, weight, bias, stride, padding, dilation, groups)


@staticmethod
def backward(cxt, grad_output):
    input, weight, bias = cxt.saved_variables
            
    grad_input = grad_weight= grad_bias = None

    if cxt.needs_input_grad[0]:
        grad_input = torch.nn.grad.conv2d_input(input.shape, weight, grad_output)
        
    if cxt.needs_input_grad[1]:
        grad_weight = torch.nn.grad.conv2d_weight(input, weight.shape, grad_output)
            
    if bias is not None and cxt.needs_input_grad[2]:
        grad_bias = grad_output.sum(0).squeeze(0)
    
    if bias is not None:
        return grad_input, grad_weight, grad_bias
    else:
        return grad_input, grad_weight

Thanks so much !

fsds · June 5, 2018, 7:20am

It looks like you’re using an old version of pytorch. Try moving to pytorch 0.4.0 and see if it works.
To verify that use:

import torch
print(torch.__version__) #should be 0.4.0

Tony_Lee · June 5, 2018, 7:25am

I have just update the pytorch version to 0.4.0,but it is also that error
could you show me your demo code

negar_goli · February 21, 2019, 12:04am

Could you please share your code that you are calling cudnn_convolution_weight

hanspinckaers · April 1, 2019, 8:10am

For other people googling this, I posted some code in this thread: Cuda error with cudnn convolution backward weight function

Shi_Heng · August 10, 2019, 1:07pm

Hi, This OOM exception comes from the python api implement of conv2d_weight actually.
In backprop weight calculation, the output gradients need to be expanded with output channel times. When default cudnn implement this with data prefetch block and block (not allocate more memory), python api uses a repeat that will allocate a huge size of memory on output gradients tensor with unnecessary duplication of data.
you can easily fix this by convert the repeat into a loop function at conv2d_weight.

Mostafa_Elhoushi · September 10, 2019, 11:15pm

I think the expression for grad_bias should be fixed to:

grad_bias = grad_output.sum((0,2,3)).squeeze(0)

Mostafa_Elhoushi · November 18, 2019, 11:27pm

Thanks for @hanspinckaers for sharing this. I managed to run the code, however it is still slow. Much slower that using Pytorch’s conv2d and letting autograd do the work.

hanspinckaers · April 10, 2020, 11:36am

Hi Mostafa, in my benchmarks it does seem to perform at equal speed. how did you benchmark it?

fsds · April 10, 2020, 12:38pm

If you mean running cudnn_convolution_backward_input() and cudnn_convolution_backward_weight() is slower than calling conv2d and letting autograd do the back-propagation, then it makes sense because you’re now calling two functions separately, and in addition (based on your previous comment) calculating the grad_bias.
If you want to override the whole back-propagation process of Conv2d and still have the same processing time, you should use the combined cudnn_convolution_backward() that returns gradients w.r.t the input, gradients w.r.t the weights and gradients w.r.t the biases in that order.

The question here and @hanspinckaers solution refer to overriding only torch.nn.grad.conv2d_weight, which is very expensive in memory, with cudnn_convolution_backward_weight().

Mostafa_Elhoushi · April 16, 2020, 1:45pm

Thanks @fsds and @hanspinckaers!

@hanspinckaers: Thank you following up. it has been a while since I did the benchmark. I recall I was training ResNet18 on Imagenet. Using Pytorch’s torch.nn.grad.conv2d_input(...) and torch.nn.grad.conv2d_weight(...) was probably twice as slow and using twice as much memory than letting PyTorch derive the backward pass of Conv2d automatically.
When I tried to use the method you provided in this link, made things a bit faster, but still much slower than PyTorch’s automatic backward pass.

@fsds: Thanks for your answer. Are you referring to td::tuple<at::Tensor,at::Tensor> cudnn_convolution_backward(...) in:

github.com

pytorch/pytorch/blob/master/aten/src/ATen/native/cudnn/Conv.cpp#L1064


    bool benchmark, bool deterministic)
{
  TensorArg grad_output{ grad_output_t, "grad_output", 1 },
            weight{ weight_t, "weight", 2 };
  return cudnn_convolution_backward_input(
      "cudnn_convolution_backward_input",
      input_size, grad_output, weight,
      padding, stride, dilation, groups, benchmark, deterministic);
}


std::tuple<at::Tensor,at::Tensor> cudnn_convolution_backward(
    const at::Tensor& input, const at::Tensor& grad_output_t, const at::Tensor& weight,
    IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups,
    bool benchmark, bool deterministic, std::array<bool,2> output_mask) {


  Tensor grad_output = grad_output_t.contiguous(input.suggest_memory_format());


  Tensor grad_input, grad_weight;
  if (input.numel() == 0) {
    if (output_mask[0]) {
      grad_input = at::empty_like(input, LEGACY_CONTIGUOUS_MEMORY_FORMAT);

So I just need to create a Python wrapper to it and invoke it in our backward pass?

fsds · April 16, 2020, 2:10pm

Yes, that is what I’m referring to, although something is a little weird here, because in Pytorch1.2 (the version I’m currently using) this function returns 3 tensors (std::tuple<at::Tensor,at::Tensor,at::Tensor>, grad_output, grad_weight, grad_bias) and in the current state of the repository it returns only 2 (std::tuple<at::Tensor,at::Tensor>, grad_output and grad_weight).
You can see here the implementation of 1.2:

github.com

pytorch/pytorch/blob/v1.2.0/aten/src/ATen/native/cudnn/Conv.cpp#L39


    IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups,
    bool benchmark, bool deterministic) {
  AT_ERROR("cudnn_convolution_backward_weight: ATen not compiled with cuDNN support");
}


at::Tensor cudnn_convolution_backward_bias(
    const at::Tensor& grad_output) {
  AT_ERROR("cudnn_convolution_backward_bias: ATen not compiled with cuDNN support");
}


std::tuple<at::Tensor,at::Tensor,at::Tensor> cudnn_convolution_backward(
    const at::Tensor& input, const at::Tensor& grad_output, const at::Tensor& weight,
    IntArrayRef padding, IntArrayRef stride, IntArrayRef dilation, int64_t groups,
    bool benchmark, bool deterministic, std::array<bool,3> output_mask) {
  AT_ERROR("cudnn_convolution_backward: ATen not compiled with cuDNN support");
}


at::Tensor cudnn_convolution_transpose(
    const at::Tensor& input, const at::Tensor& weight, const at::Tensor& bias /* optional */,
    IntArrayRef padding, IntArrayRef output_padding, IntArrayRef stride, IntArrayRef dilation,
    int64_t groups, bool benchmark, bool deterministic) {

Either way, using this should give you better performance.

ptrblck · April 17, 2020, 12:31am

We removed the bias, as the backward pass was faster with native PyTorch ops and had some other advantages as seen in this PR.

r3coder · March 4, 2021, 4:29am

This doesn’t works if the stride and padding is different from basic values, so I’ve edited a bit.

ValueError: requested an input grad size of [4, 4], but valid sizes range from [6, 6] to [6, 6] (for a grad_output of torch.Size([4, 4]))

So, I saved some arguments with save_for_backward to work with it.

class Conv2dFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1):
        # Save arguments to context to use on backward
        # WARNING : if stride, padding, dilation etc is array, this will not work properly!!!!
        confs = torch.from_numpy(np.array([stride, padding, dilation, groups]))
        ctx.save_for_backward(input, weight, bias, confs)

        # Compute Convolution
        return F.conv2d(input, weight, bias=bias, stride=stride, padding=padding, dilation=dilation, groups=groups)
    
    @staticmethod
    def backward(ctx, grad_output):
        # Load saved tensors
        input, weight, bias, confs = ctx.saved_variables
        confs = confs.numpy()
        stride, padding, dilation, groups= confs[0], confs[1], confs[2], confs[3]

        # Calculate Gradient
        grad_input = grad_weight = grad_bias = None
        if ctx.needs_input_grad[0]:
            grad_input = torch.nn.grad.conv2d_input(input.shape, weight, grad_output, stride, padding, dilation, groups)
            
        if ctx.needs_input_grad[1]:
            grad_weight = torch.nn.grad.conv2d_weight(input, weight.shape, grad_output, stride, padding, dilation, groups)
                
        # WARNING : Bias maybe buggy, remove if it is buggy
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0).squeeze(0)


        # WARNING : Bias maybe buggy, remove if it is buggy
        if bias is not None:
            return grad_input, grad_weight, grad_bias, None, None, None, None
        else:
            return grad_input, grad_weight, None, None, None, None, None

Since there are more stride, padding, etc on the forward input, there need more output None to calculate grad.

Is works fine, but is there any elegant way to do this without returning useless None(s)?

Also, I think conv with bias is buggy. I’ll fix if I have more spare time