Using tensor.data

seliad · May 4, 2020, 8:39am

Hi,
I saw that in 1.5 usage of tensor.data as removed from most places,
I wonder,

why?
I want to change weights explicitly so the forward and backward will not be used on same data (I used .data so far to do it). What options do I have for that?

albanD · May 4, 2020, 3:15pm

Hi,

because it has many side effects that does more harm than good
You can use t2 = t.detach() to get a new Tensor that has the same content but does not share the gradient history.
For modifying a Tensor inplace, you can also use things like

with torch.no_grad():
  # Ops here won't be tracked
  t.zero_()

seliad · May 4, 2020, 3:37pm

can you elaborate on the side effects you know?
when is it good idea to still use data?

sometimes changing in place will cause raising auto-grad errors

seems to me like tensor.data.clone() is slightly more efficient than tensor.detach().clone(), when we explicitly want cloning just the data.

similarly, when we want to send just the data to a device

a = tensor.data.to(device)
# vs
a = tensor.detach().to(device)
# vs*
with torch.no_grad():
    a = tensor.to(device)  # doesn't it send the grad too?

albanD · May 4, 2020, 3:55pm

can you elaborate on the side effects you know?

Break the computational graph
Break the inplace correctness checks
Allow untracked inplace changes
Allow inplace metadata changes of a Tensor
When using nn.Parameter, it alias with the .data in that class that refers to the underlying Tensor.

when is it good idea to still use data?

For 99.9% of the users: never.
The only case where it is at the moment (until we provide an API for it) is to go around the inplace correctness checks when they are too strict and you know exactly what you’re doing.

seems to me like tensor.data.clone() is slightly more efficient than tensor.detach().clone()

What do you mean by more efficient?
If you mean faster, then not really. Both do a shallow copy of the Tensor. .detach() might be imperceptibly faster as it does not need to recreate inplace correctness tracking metadata.

For sending just the content to a new device: a = tensor.detach().to(device) will do the trick.

seliad · May 4, 2020, 6:05pm

Thanks, this is very informative. Indeed I use data to overcome inplace correctness checks.
Didn’t know data does a shallow copy.

You say we can just avoid the correctness checks under torch.no_grad?

One thing I want to fully understand, the aliasing you mentioned in nn.Parameter

a = torch.randn(1)
b = nn.Parameter(a)

When we do b.data is it exactly like calling a.data? Is that what you mean? is it a problem?

albanD · May 4, 2020, 6:09pm

You say we can just avoid the correctness checks under torch.no_grad?

No, you only ignore these ops for gradient computation.
Avoiding correctness checks is the last “valid” use case for .data in the sense that we don’t have another way to do this yet (be careful though! You can get wrong gradients by doing so!!)

When we do b.data is it exactly like calling a.data ?

I an not 100% sure of what the behavior is now that Variables don’t exist. But I would avoid it
But yes I think the two are the same. And so have the same limitations as .data in general.

Tejan_Mehndiratta · May 18, 2021, 2:46pm

Hi @albanD,

I don’t understand what are the main differences between calling a tensor.data and tensor.detach().
Could you please help me out here?

My understanding is just this:
Both the methods are removing the tensor from the computational graph.

But I don’t understand the difference the two methods provide.

Thanks.

albanD · May 18, 2021, 6:12pm

Hi,

The difference is that one does it in a way that the autograd knows about and the other (.data) hides it from the autograd.
This means that many of the sanity checks that the autograd does to ensure it always return correct gradients won’t be able to run properly if you use .data.