Extending Pytorch: Python vs. C++ vs. CUDA


I have been trying to implement a custom Conv2d module where grad_input (dx) and grad_weight (dw) are calculated by using different grad_output (dy) values. I implemented this by extending torch.autograd as in Pytorch tutorials.

However I am confused by the information in this link.

  • Is extending the autograd.Function not enough?
  • What is the difference between writing a new autograd function in Python vs C++?
  • How about the CUDA implementations in /torch/nn/blob/master/lib/THNN/generic/SpatialConvolutionMM.c where dx and dw calculated? Should I change them too?

Here is my custom function:

class myCustomConv2d(torch.autograd.Function):
    def forward(ctx, x, w, bias=None, stride=1, padding=0, dilation=1, groups=1):
        ctx.save_for_backward(x, w, bias)
        ctx.stride = stride
        ctx.padding = padding
        ctx.dilation = dilation
        ctx.groups = groups
        out = F.conv2d(x, w, bias, stride, padding, dilation, groups)
        return out

    def backward(ctx, grad_output):
        input, weight, bias = ctx.saved_tensors
        stride = ctx.stride
        padding = ctx.padding
        dilation = ctx.dilation
        groups = ctx.groups
        grad_input = grad_weight = grad_bias = None

        dy_for_inputs = myspecialfunction1(grad_output)
        dy_for_weights = myspecialfunction2(grad_output)

        grad_input = torch.nn.grad.conv2d_input(input.shape, weight, dy_for_inputs , stride, padding, dilation, groups)
        grad_weight = torch.nn.grad.conv2d_weight(input, weight.shape, dy_for_weights , stride, padding, dilation, groups)

        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = dy_for_weights .sum((0,2,3)).squeeze(0)

        return grad_input, grad_weight, grad_bias, None, None, None, None

You don’t need to write the C++ extensions and your current custom autograd.Function should work.
Implementing the custom methods in C++ could yield a speedup as shown in the tutorial.


Thank you very much!
And is it the same as C++ for changing the CUDA implementations in /torch/nn/blob/master/lib/THNN/generic/SpatialConvolutionMM.c where dx and dw calculated? I mean is it only for speedup?

Yes, writing a custom CUDA operation could also yield a speedup, but is also optional.

1 Like