Hi,

I am experiencing a behaviour that I would appreciate some clarification on. The following code, where I create tensors from numpy arrays, works as expected, meaning the gradients are computed after the backward pass and can be used to update the weights.

```
data = torch.tensor(np.load('mnist_train_data.npy')).float()
label = torch.tensor(np.load('mnist_train_label.npy').argmax(axis=1)).long()
weights1 = torch.tensor(np.random.randn(784, 128).astype(np.float32), requires_grad=True).float()
bias1 = torch.tensor(np.random.randn(128).astype(np.float32), requires_grad=True).float()
weights2 = torch.tensor(np.random.randn(128, 10).astype(np.float32), requires_grad=True).float()
bias2 = torch.tensor(np.random.randn(10).astype(np.float32), requires_grad=True).float()
output1 = torch.nn.functional.relu(data @ weights1 + bias1)
output2 = output1 @ weights2 + bias2
loss = torch.nn.CrossEntropyLoss()(output2, label)
loss.backward()
weights1.data -= 0.01 * weights1.grad.data
bias1.data -= 0.01 * bias1.grad.data
weights2.data -= 0.01 * weights2.grad.data
bias2.data -= 0.01 * bias2.grad.data
```

However, if I do not explicitly cast the numpy arrays from dtype `np.float64`

to dtype `np.float32`

before transforming them into tensors, the following variation raises this error: `AttributeError: 'NoneType' object has no attribute 'data'`

:

```
weights1 = torch.tensor(np.random.randn(784, 128), requires_grad=True).float()
bias1 = torch.tensor(np.random.randn(128), requires_grad=True).float()
weights2 = torch.tensor(np.random.randn(128, 10), requires_grad=True).float()
bias2 = torch.tensor(np.random.randn(10), requires_grad=True).float()
```

Moreover, if I initialize the tensors with the deprecated Variable API, the following variation appears to behave like the first example, even without the explicit cast to `np.float32`

before creating the tensors. The resulting losses remain similar.

```
weights1 = torch.autograd.Variable(torch.tensor(np.random.randn(784, 128)).float(), requires_grad=True)
bias1 = torch.autograd.Variable(torch.tensor(np.random.randn(128)).float(), requires_grad=True)
weights2 = torch.autograd.Variable(torch.tensor(np.random.randn(128, 10)).float(), requires_grad=True)
bias2 = torch.autograd.Variable(torch.tensor(np.random.randn(10)).float(), requires_grad=True)
```

This part seems particularly odd to me, since the Pytorch 0.4.0 Migration Guide states, that `torch.Tensor`

and `torch.autograd.Variable`

are now the same class. I am using torch version 0.4.1 without cuda.

Further, if I initialize the tensors with `.double()`

instead of `.float()`

, everything works as I would expect if I leave the numpy arrays with dtype `np.float64`

.

Any clarification on this behaviour would be appreciated.

Cheers