I'm not sure how to write a custom autograd.Function

ymin2570 · May 8, 2024, 9:11am

Through the response in this post, I’ve identified the issue in my code.

The error RuntimeError: element 0 of variables does not require grad and does not have a grad_fn occurred because I was recreating a tensor “out” in my custom function, detached from the computational graph.
My model and implemented function are structured as follows:

  def forward(self, x):
    x = self.head(x)

    res = self.body(x)
    res += x

    x = self.tail(res)                   # x.grad_fn returns valid value
    out = custom_function(x)             # out.grad_fn returns invalid value (None)
    out = out.cuda()

    return out

def custom_function(im):      #custom upsample function
       new_im = custom_module1(im)
       upsampled_im = new_im.repeat_interleave(2, dim=2).repeat_interleave(2, dim=3)
       # calculation part using new_im
       out = custom_module2(upsampled_im)  #Similar behavior with custom_module1
       return out

def custom_module1(im):
    b, _, h, w = im.shape
    new_im = torch.zeros((b, 3, h, w), dtype = torch.float32)  #recreating a tensor
    # calculation part using im
    return new_im

However, due to my conditions, I cannot alter that behavior.
In this case, I need to write custom autograd.Function. However, I’m having trouble understanding the example code provided for this. Specifically, I’m unsure of how to go about it.

In my case, the function performs upsampling, and it’s challenging to formulate the differentiation function.

Can anyone help me with this?

ptrblck · May 8, 2024, 3:37pm

If you are creating a new tensor it does not have a gradient history and previous operations don’t affect this tensor in any way.

I don’t think the actual backward computation is the issue in your use case but the lack of a computation graph. Since the tensor is created as a new leaf tensor not via upsampling, no gradients will flow back.

ymin2570 · May 9, 2024, 3:06am

Then my custom_function won’t be usable for network training…

Thank you for your response. It was very helpful.

ymin2570 · May 13, 2024, 4:26am

Hello, ptrblck,

def custom_function(im):      #custom upsample function, same as before.
       im = custom_module1(im)
       upsampled_im = im.repeat_interleave(2, dim=2).repeat_interleave(2, dim=3)
       # calculation part using im
       out = custom_module2(upsampled_im)  #Similar behavior with custom_module1
       return out

def custom_module1(im):
    im [:, 0, :, :] = transfer_table[im[:, 0, :, :].to(torch.int64), 0]
    return im

I attempted to train network by modifying my custom_module1 code as above, but encountered another issue where the network isn’t learning properly.

It seems like the problem may have arisen due to the transformation of values caused by the transfer_table which is nonlinear mapping.

I think the deep learning network might not be able to flow back through the gradient due to mapping via this transfer_table.

I’m curious whether it’s possible to train a deep learning network using such a look-up table, or if there are any references using this approach.

ptrblck · May 13, 2024, 1:17pm

You can train the lookup table itself, since it’s also done in embedding layers, but not the input since you are detaching it by transforming it to an integer type. Integer types are not usefully differentiable since their gradient would be zero everywhere and undefined (or Inf) at the rounding points.