I’m developing a model that takes a 3-channel input image, and outputs a 3-channel output image of the same size (256 x 256). I’m trying to get the gradient of the output image with respect to the input image. My code looks like below:

I’m having an understanding difficulty here: isn’t the gradient of a (3x256x256)-element vector with respect to (3x256x256) variables of shape (3x256x256, 3x256x256)? Or what gradient is being returned in img_input.grad?

@Tom: what do you mean by many backwards? How would this look for a simple example? Say for the case when img_input and img_output are both 1x2x2 images, how do I get the gradient of each element (or a subset of elements) in img_output with respect to each element of img_input?

model = nn.Conv2d(1,1,3,padding=1, bias=False) # you'll have a better model
img_input = Variable(torch.ones(1,1,2,2), requires_grad=True)
img_output = model(img_input)
onepix = torch.FloatTensor(1,1,2,2)
for x in range(2):
for y in range(2):
onepix.zero_()
onepix[0,0,x,y] = 1
img_input.grad = None
img_output.backward(gradient=onepix, retain_graph=True)
print (x,y,img_input.grad)

(with retain_graph being retain_variables for <= 0.1.12).

Of course, that is cool for 1x2x2 images to check things out but not anything you would want to use on a larger scale.

image_variable has a size [1, 3, 299, 299]
prediction has size [1, 1000]

After .backward() shouldn’t the image_variable.grad have size [1, 3, 299, 299, 1000] since it is essentially a Jacobian?
But it has same dimensions as input. ie [1, 3, 299, 299]