The difference between torch.tensor.data and torch.tensor

I write the following code:

a = torch.rand(3, 5)
type(a)                      #  torch.Tensor
type(a.data)              #  torch.Tensor
id(a)                          #  4493764504
id(a.data)                 #  4493475488

I don’t understand the difference between Tensor a and Tensor a.data, and when to use a and when to use a.data.
Thanks.

1 Like

Hi,

The .data field is an old field that is kept for backward compatibility but should not be used anymore as it’s usage is dangerous and can make computations wrong. You should use .detach() and/or with torch.no_grad() instead now.

5 Likes

I noticed it’s still in this tutorial

_, predicted = torch.max(outputs.data, 1)

3 Likes

Hi @albanD , if data should not be used can you please explain how to execute the following while still getting the model with the adjusted weights?
when I run:

i = 1
for w in LinModel.parameters():
    torch.nn.init.eye_(w)
    w.data = w.data*i
    i += 1

the model weights are adjusted properly by a factor of i after the loop ends. if I don’t use data then just the identity init is saved.

Hi,

You can do either

with torch.no_grad():
  w *= i

Or if you need cannot do it inplace directly:

with torch.no_grad():
  w.copy_(w * i)
2 Likes

if we do an inplace operation on the data property of a leaf variable with requires_grad=true it seems fine. But what is dangerous exactly? is it safe to use it in pixelcnn code (pixelcnn-pytorch/masked_cnn_layer.py at master · axeloh/pixelcnn-pytorch · GitHub)? does it still track the gradients?

Normally if you make in-place updates on a tensor it would update a version tracker that would error out if autograd realizes that the original tensor is needed for gradient computation during backward.

But if you save a tensor for backward, and then mutate its data property in-place, and then backward, you lose the protection of version counter checking and the results would be silently incorrect.

1 Like