Hello everyone,
I’m trying to implement a custom convolution layer. However, I only want to custom the forward function, and leave the backward function unchanged.
Following the guidance from https://pytorch.org/docs/stable/notes/extending.html, I think the implementation would be something like
class MyConv2d(Function):
def forward(ctx, input, weight):
ctx.save_for_backward(input, weight)
input_temp, weight_temp = f(input, weight)
output = g(input_temp, weight_temp)
return output
def backward(ctx, grad_output):
input, weight = ctx.saved_tensors
'''
Compute backward with original inputs,
i.e., compute gradients of output directly to input and weight,
and ignore the functions in the middle (i.e., f(·), g(·))
'''
return grad_input, grad_weight
where f(·)
and g(·)
are custom functions for computing convolutions. One example would be to flatten the input, turning convolutions into matrix multiplications. In this case, f(·)
could be unfold(·)
. As the backward path of unfold(·)
is very slow, I don’t want this operation to be part of the computational graph. In other words, no matter what f(·)
and g(·)
are, I would like the autograd
always treats this layer as output = nn.conv2d(input, weight)
, making the computational graph only include input
, weight
, and output
without any middle variables and functions.
The pseudocode above requires customizing the gradient calculation of convolution. I tried several approaches, but none of them can reach the same speed as the original conv2d
function. I’m wondering:
- How to implement the backward function of conv layer as fast as the original one in Pytorch?
- Otherwise, are there any other simple ways to handle this problem?
Please help and thanks!!