Grid_sample: gradients wrt input image not as expected

I would like to apply a known affine transform as part of my network using grid_sample and then backprop through the affine transform so that the gradients pass to the input image/feature map. I made a small test, detailed below. But the output result does not appear to be working as expected.

Thank you for any assistance.

import torch
import torch.nn.functional as F
from torch.autograd import Variable
import matplotlib.pyplot as plt

example_input = Variable(torch.ones(1,1,28,28), requires_grad=True)

theta = Variable(torch.Tensor([[[1.0, 0.0, 0.0], [0.0, 1.0, 0.5]]]))

flow = F.affine_grid(theta, torch.Size((1,1,28,28)))
shifted_input = F.grid_sample(example_input, flow)
target = Variable(shifted_input.data.clone())
target[0,0,10,10] = 0.0

loss = (shifted_input - target)
loss.sum().backward()

plt.subplot(2,2,1)
plt.imshow(loss.data[0,0].numpy())
plt.subplot(2,2,2)
plt.imshow(target.data[0,0].numpy())
plt.subplot(2,2,3)
plt.imshow(shifted_input.data[0,0].numpy())
plt.subplot(2,2,4)
plt.imshow(example_input.grad.data[0,0].numpy())