In the code snippets below, method 2 produces a gradient whereas method 1 doesn’t. Any clues as to why?
Note that to perform .mm(), the tensor needs to be a float or a double. Also, torch.FloatTensor() doesn’t have a requires_grad option.
# Method 1 - gradient NOT calculated
C = torch.tensor([0,0,1,0],requires_grad=True).double().view(2,2) # .double() is needed for .mm() below to work
x = torch.tensor([0.5,0.5]).view(2,1).double() # .double() is needed for .mm() below to work
x = C.mm(x)
err = (1-x[1,0]).pow(2)
err.backward()
print(C.grad) # produces None
# Method 2 - gradient is calculated correctly
C = Variable(torch.FloatTensor([0,0,1,0]).view(2,2),requires_grad=True)
x = torch.FloatTensor([0.5,0.5]).view(2,1)
x = C.mm(x)
err = (1-x[1,0]).pow(2)
err.backward()
print(C.grad) # produces the correct gradient