Hi,
I am experiencing a behaviour that I would appreciate some clarification on. The following code, where I create tensors from numpy arrays, works as expected, meaning the gradients are computed after the backward pass and can be used to update the weights.
data = torch.tensor(np.load('mnist_train_data.npy')).float()
label = torch.tensor(np.load('mnist_train_label.npy').argmax(axis=1)).long()
weights1 = torch.tensor(np.random.randn(784, 128).astype(np.float32), requires_grad=True).float()
bias1 = torch.tensor(np.random.randn(128).astype(np.float32), requires_grad=True).float()
weights2 = torch.tensor(np.random.randn(128, 10).astype(np.float32), requires_grad=True).float()
bias2 = torch.tensor(np.random.randn(10).astype(np.float32), requires_grad=True).float()
output1 = torch.nn.functional.relu(data @ weights1 + bias1)
output2 = output1 @ weights2 + bias2
loss = torch.nn.CrossEntropyLoss()(output2, label)
loss.backward()
weights1.data -= 0.01 * weights1.grad.data
bias1.data -= 0.01 * bias1.grad.data
weights2.data -= 0.01 * weights2.grad.data
bias2.data -= 0.01 * bias2.grad.data
However, if I do not explicitly cast the numpy arrays from dtype np.float64
to dtype np.float32
before transforming them into tensors, the following variation raises this error: AttributeError: 'NoneType' object has no attribute 'data'
:
weights1 = torch.tensor(np.random.randn(784, 128), requires_grad=True).float()
bias1 = torch.tensor(np.random.randn(128), requires_grad=True).float()
weights2 = torch.tensor(np.random.randn(128, 10), requires_grad=True).float()
bias2 = torch.tensor(np.random.randn(10), requires_grad=True).float()
Moreover, if I initialize the tensors with the deprecated Variable API, the following variation appears to behave like the first example, even without the explicit cast to np.float32
before creating the tensors. The resulting losses remain similar.
weights1 = torch.autograd.Variable(torch.tensor(np.random.randn(784, 128)).float(), requires_grad=True)
bias1 = torch.autograd.Variable(torch.tensor(np.random.randn(128)).float(), requires_grad=True)
weights2 = torch.autograd.Variable(torch.tensor(np.random.randn(128, 10)).float(), requires_grad=True)
bias2 = torch.autograd.Variable(torch.tensor(np.random.randn(10)).float(), requires_grad=True)
This part seems particularly odd to me, since the Pytorch 0.4.0 Migration Guide states, that torch.Tensor
and torch.autograd.Variable
are now the same class. I am using torch version 0.4.1 without cuda.
Further, if I initialize the tensors with .double()
instead of .float()
, everything works as I would expect if I leave the numpy arrays with dtype np.float64
.
Any clarification on this behaviour would be appreciated.
Cheers