Select input slices to calculate gradient for with torch.autograd.grad

skumar_ml · July 8, 2024, 11:14pm

I am interested in calculating the gradient of some arbitrary embedding output (B, C’, H’, W’) with respect to the input (dimensions of B, C, H, W). However, I only want to find the gradients for a specific subset of the input (e.g. input[:, :, y1:y2, x1:x2]).

Consider the following example code:

output = model(images)
grads = torch.autograd.grad(outputs=output, inputs=images[:, :, y1:y2, x1:x2], grad_outputs=torch.ones_like(output))

When trying to find grads, I error out with the following:

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. My understanding (from this) is that the error results from doing images[:, :, y1:y2, x1:x2], which returns a copy of the original images and thus is not present in the graph. How can I get around this?

NOTE:

I am not interested in taking a crop of the gradients. My image is of large spatial resolution, but I am only interested in the gradients associated with a certain patch of the image. Calculating the gradient wrt the entire image is expensive.

So this solution will not work:

output = model(images)
grads = torch.autograd.grad(outputs=output, inputs=images, grad_outputs=torch.ones_like(output))
grads = grads[:, :, y1:y2, x1:x2]

ptrblck · July 9, 2024, 7:18pm

The error is expected since a new variable is created through the differentiable slice operation. A workaround is described here.

skumar_ml · July 9, 2024, 8:00pm

Thank you for sharing that reference. That’s a clever workaround. How would you recommend solving this problem when I’m interested in taking the gradient for overlapping image patches?

For some context, I am working with a CNN on image data, and I am interested in finding the derivative of each embedding wrt to the input pixels. I had tried to compute the Jacobian, but I get memory issues since it results in finding a (B, C, H, W, C’) matrix. I wanted to speed it up by only computing the gradients on the portion of the image that is in the embedding’s receptive field (this is early in the network, so there are a significant amount of zero gradients that I should be able to skip computing).

One solution is to extend your suggestion to the limit by recreating the image as a stack of pixel tensors. Then, I could pass the appropriate ‘slice’ into torch.autograd.grad’s inputs parameter as a flattened tuple. That seems unnecessarily complex, though. Am I missing something?

For clarity, my code looks something like this (currently taking a slice of the gradients after calling torch.autograd.grad):

output = first_model(images)
output = rearrange(output, "b c h w -> (b h w) c")
for i, embed in enumerate(output):
    embed_grads = torch.autograd.grad(outputs=embed, inputs=images, grad_outputs=torch.ones_like(embed))

anantguptadbl · July 10, 2024, 7:31pm

@skumar_ml
Maybe we can use torch.maximum the way stack was used in @ptrblck 's solution

x,y,z = img.shape
img1_full = np.zeros((x,y,z)).astype(int) 
img2_full = np.zeros((x,y,z)).astype(int) 
img3_full = np.zeros((x,y,z)).astype(int) 

# Overlapping slices
img1_full[0:int(x/2), 0:int(y/2), :] = img[0:int(x/2), 0:int(y/2), :]
img2_full[int(x/3):int((2*x)/3), int(y/3):int((2*y)/3), :] = img[int(x/3):int((2*x)/3), int(y/3):int((2*y)/3), :]
img3_full[int(x/2):, int(y/2):, :] = img[int(x/2):, int(y/2):, :]
img1_tensor = torch.from_numpy(img1_full)
img2_tensor = torch.from_numpy(img2_full)
img3_tensor = torch.from_numpy(img3_full)
img_tensor = torch.from_numpy(img)

# Final image which we will act as input into the model
final_img = torch.maximum(torch.maximum(torch.maximum(img1_tensor, img2_tensor),img3_tensor), img_tensor)