Gradients of FloatTensor remain None when tensor is initialized from Numpy array with dtype np.float64

vince · October 18, 2018, 10:08am

Hi,

I am experiencing a behaviour that I would appreciate some clarification on. The following code, where I create tensors from numpy arrays, works as expected, meaning the gradients are computed after the backward pass and can be used to update the weights.

data = torch.tensor(np.load('mnist_train_data.npy')).float()
label = torch.tensor(np.load('mnist_train_label.npy').argmax(axis=1)).long()

weights1 = torch.tensor(np.random.randn(784, 128).astype(np.float32), requires_grad=True).float()
bias1 = torch.tensor(np.random.randn(128).astype(np.float32), requires_grad=True).float()
weights2 = torch.tensor(np.random.randn(128, 10).astype(np.float32), requires_grad=True).float()
bias2 = torch.tensor(np.random.randn(10).astype(np.float32), requires_grad=True).float()

output1 = torch.nn.functional.relu(data @ weights1 + bias1)
output2 = output1 @ weights2 + bias2

loss = torch.nn.CrossEntropyLoss()(output2, label)
loss.backward()

weights1.data -= 0.01 * weights1.grad.data
bias1.data -= 0.01 * bias1.grad.data
weights2.data -= 0.01 * weights2.grad.data
bias2.data -= 0.01 * bias2.grad.data

However, if I do not explicitly cast the numpy arrays from dtype np.float64 to dtype np.float32 before transforming them into tensors, the following variation raises this error: AttributeError: 'NoneType' object has no attribute 'data':

weights1 = torch.tensor(np.random.randn(784, 128), requires_grad=True).float()
bias1 = torch.tensor(np.random.randn(128), requires_grad=True).float()
weights2 = torch.tensor(np.random.randn(128, 10), requires_grad=True).float()
bias2 = torch.tensor(np.random.randn(10), requires_grad=True).float()

Moreover, if I initialize the tensors with the deprecated Variable API, the following variation appears to behave like the first example, even without the explicit cast to np.float32 before creating the tensors. The resulting losses remain similar.

weights1 = torch.autograd.Variable(torch.tensor(np.random.randn(784, 128)).float(), requires_grad=True)
bias1 = torch.autograd.Variable(torch.tensor(np.random.randn(128)).float(), requires_grad=True)
weights2 = torch.autograd.Variable(torch.tensor(np.random.randn(128, 10)).float(), requires_grad=True)
bias2 = torch.autograd.Variable(torch.tensor(np.random.randn(10)).float(), requires_grad=True)

This part seems particularly odd to me, since the Pytorch 0.4.0 Migration Guide states, that torch.Tensor and torch.autograd.Variable are now the same class. I am using torch version 0.4.1 without cuda.

Further, if I initialize the tensors with .double() instead of .float(), everything works as I would expect if I leave the numpy arrays with dtype np.float64.

Any clarification on this behaviour would be appreciated.

Cheers

ptrblck · October 18, 2018, 11:14am

The issue is that you are creating non leaf variables by casting the tensors.
The reason that the cast works sometimes it that your cast is a no op, since the underlying data is already in the desired format. E.g:

weights1 = torch.tensor(np.random.randn(784, 128).astype(np.float32), requires_grad=True).float()

The call to .float() doesn’t do anything as the data is already in this type.

This line of code

weights1 = torch.tensor(np.random.randn(784, 128), requires_grad=True).float()

creates a new tensor using the operation float(). weights1 is therefore not a leaf variable anymore.

Have a look at this excellent explanation for other use cases.