Through the response in this post, I’ve identified the issue in my code.
The error RuntimeError: element 0 of variables does not require grad and does not have a grad_fn
occurred because I was recreating a tensor “out” in my custom function, detached from the computational graph.
My model and implemented function are structured as follows:
def forward(self, x):
x = self.head(x)
res = self.body(x)
res += x
x = self.tail(res) # x.grad_fn returns valid value
out = custom_function(x) # out.grad_fn returns invalid value (None)
out = out.cuda()
return out
def custom_function(im): #custom upsample function
new_im = custom_module1(im)
upsampled_im = new_im.repeat_interleave(2, dim=2).repeat_interleave(2, dim=3)
# calculation part using new_im
out = custom_module2(upsampled_im) #Similar behavior with custom_module1
return out
def custom_module1(im):
b, _, h, w = im.shape
new_im = torch.zeros((b, 3, h, w), dtype = torch.float32) #recreating a tensor
# calculation part using im
return new_im
However, due to my conditions, I cannot alter that behavior.
In this case, I need to write custom autograd.Function
. However, I’m having trouble understanding the example code provided for this. Specifically, I’m unsure of how to go about it.
In my case, the function performs upsampling, and it’s challenging to formulate the differentiation function.
Can anyone help me with this?
If you are creating a new tensor it does not have a gradient history and previous operations don’t affect this tensor in any way.
I don’t think the actual backward
computation is the issue in your use case but the lack of a computation graph. Since the tensor is created as a new leaf tensor not via upsampling, no gradients will flow back.
Then my custom_function won’t be usable for network training…
Thank you for your response. It was very helpful.
Hello, ptrblck,
def custom_function(im): #custom upsample function, same as before.
im = custom_module1(im)
upsampled_im = im.repeat_interleave(2, dim=2).repeat_interleave(2, dim=3)
# calculation part using im
out = custom_module2(upsampled_im) #Similar behavior with custom_module1
return out
def custom_module1(im):
im [:, 0, :, :] = transfer_table[im[:, 0, :, :].to(torch.int64), 0]
return im
I attempted to train network by modifying my custom_module1
code as above, but encountered another issue where the network isn’t learning properly.
It seems like the problem may have arisen due to the transformation of values caused by the transfer_table which is nonlinear mapping.
I think the deep learning network might not be able to flow back through the gradient due to mapping via this transfer_table.
I’m curious whether it’s possible to train a deep learning network using such a look-up table, or if there are any references using this approach.
You can train the lookup table itself, since it’s also done in embedding layers, but not the input since you are detaching it by transforming it to an integer type. Integer types are not usefully differentiable since their gradient would be zero everywhere and undefined (or Inf
) at the rounding points.