I have a network that acts on a tensor h and I need at some point to access a quantity u that happens to be the gradient of an unrelated operation, MSE(F.interpolate(h) - y), with regard to h. The way I’m doing it at the moment is creating a cloned, detached version of h that requires gradient, feeding that into the operation, calling backward() and then h_clone.grad. However, at test time, this fails with “RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn”, presumably because in eval mode, torch does not keep the gradients. What is the proper way to resolve this ? Since really all I want is the derivative of the bicubic interpolation, is there some kind of backward call I can apply on the function directly to access the gradient? This would also allow me to have a gradient signal from u to h as I’m afraid creating another clone object does not do that.
Thanks in advance,
(as an aside, what algorithm does torch bicubic interpolation use ? I’ve written a small downsampling script to use the easily-computable derivative as it’s just a convolution operation, following this, but it doesn’t seem to correspond to any of the filters)
F.interpolate is stateless. You need a layer to be able to compute the grad directly with autograd.
Would torch.nn.Upsample with `mode=‘bicubic’ make sense in your use case?
Yeah I was a bit afraid of that - unfortunately, I want downsampling, and the doc for upsample seems to imply it can’t do that ("If you want downsampling/general resizing, you should use
interpolate()"). However, even if autograd doesn’t apply here, gradient can still flow through interpolate, right ?
Yes, actually you are correct.
How do you create the detached clone?
If you do this, it should work:
h_clone = h.clone().detach().requires_grad_()
Yeah that’s what I’m doing - but it didn’t work at validation time since there’s no gradient. I added with torch.enable_grad() for that case for validation time and it runs ! but I have an issue with part of that.
Let’s say I then do h - h.grad. That’s an operation that depends on h, that should be counted in the backpropagation, right ? But in this case it would be adding a gradientless, independent of h vector and so wouldn’t change anything. Now this is why I’d like to have directly access to the derivative of F.interpolate; since gradient flows through it, is there any way to call the internal backward function that must exist somewhere within the C code ?
Again - what kind of filter does pytorch downsampling use ? Assuming it’s standard downsampling, I would be able to write a derivative for it with simple operations.