Gradient computed on CPU but nor computed on GPU?

I am writing some code for something similar to RoI pooling. The gradient propagates back well when I use CPU but not on GPU? Does anyone have any idea? Thanks a lot.

A demo is like this.

CPU:

out = torch.zeros(1, 3, 6, 6)
vout = Variable(out)
fmap = np.arange(3 * 6 * 6).reshape((1, 3, 6, 6))
tmap = Variable(torch.from_numpy(fmap).float(), requires_grad=True)

mask = torch.zeros(1, 6, 6).byte()
mask[0, 2:5, 2:5] = 1
mask = Variable(mask.expand(1, 3, 6, 6))
masked = tmap.masked_select(mask).view(3, -1)
pooled = torch.max(masked, 1)[0][:, 0]
vout[0, :, 0, 0] = pooled
# similar to the operation above
mask = torch.zeros(1, 6, 6).byte()
mask[0, 3:6, 3:6] = 1
mask = Variable(mask.expand(1, 3, 6, 6))
masked = tmap.masked_select(mask).view(3, -1)
pooled = torch.max(masked, 1)[0][:, 0]
vout[0, :, 1, 1] = pooled

a = torch.mean(vout)
a.backward()
print tmap.grad

GPU:

out = torch.zeros(1, 3, 6, 6)
vout = Variable(out).cuda()
fmap = np.arange(3 * 6 * 6).reshape((1, 3, 6, 6))
tmap = Variable(torch.from_numpy(fmap).float(), requires_grad=True).cuda()

mask = torch.zeros(1, 6, 6).byte().cuda()
mask[0, 2:5, 2:5] = 1
mask = Variable(mask.expand(1, 3, 6, 6))
masked = tmap.masked_select(mask).view(3, -1)
pooled = torch.max(masked, 1)[0][:, 0]
vout[0, :, 0, 0] = pooled

mask = torch.zeros(1, 6, 6).byte().cuda()
mask[0, 3:6, 3:6] = 1
mask = Variable(mask.expand(1, 3, 6, 6))
masked = tmap.masked_select(mask).view(3, -1)
pooled = torch.max(masked, 1)[0][:, 0]
vout[0, :, 1, 1] = pooled

a = torch.mean(vout)
a.backward()
print tmap.grad

The result is None.

I am using version 0.1.9.

It’s because you called .cuda() before assignment to tmap. Casting or copying the Varaible to another device returns you a different Variable, so tmap is not a leaf and won’t have the gradient accumulated. Either do this:

tmap_leaf = Variable(...)
tmap = tmap_leaf.cuda()

or better

tmap = Variable(torch.from_numpy(...).float().cuda(), requires_grad=True)
3 Likes

Oh, I get it. Thank you very much!