Backward won't work for long tensors after mm()

Santosh_Manicka · September 13, 2018, 2:57pm

In the code snippets below, method 2 produces a gradient whereas method 1 doesn’t. Any clues as to why?
Note that to perform .mm(), the tensor needs to be a float or a double. Also, torch.FloatTensor() doesn’t have a requires_grad option.

# Method 1 - gradient NOT calculated
C = torch.tensor([0,0,1,0],requires_grad=True).double().view(2,2)  # .double() is needed for .mm() below to work
x = torch.tensor([0.5,0.5]).view(2,1).double()  # .double() is needed for .mm() below to work
x = C.mm(x)
err = (1-x[1,0]).pow(2)
err.backward()
print(C.grad)  # produces None

# Method 2 - gradient is calculated correctly
C = Variable(torch.FloatTensor([0,0,1,0]).view(2,2),requires_grad=True)
x = torch.FloatTensor([0.5,0.5]).view(2,1)
x = C.mm(x)
err = (1-x[1,0]).pow(2)
err.backward()
print(C.grad)  # produces the correct gradient

albanD · September 13, 2018, 3:00pm

Hi,

The call to .double() makes it so that what you store in your variable C is not the tensor that actually requires grad. See this other post for a similar problem.