Hi,
I saw that in 1.5 usage of tensor.data
as removed from most places,
I wonder,
- why?
- I want to change weights explicitly so the forward and backward will not be used on same data (I used .data so far to do it). What options do I have for that?
Hi,
I saw that in 1.5 usage of tensor.data
as removed from most places,
I wonder,
Hi,
t2 = t.detach()
to get a new Tensor that has the same content but does not share the gradient history.with torch.no_grad():
# Ops here won't be tracked
t.zero_()
can you elaborate on the side effects you know?
when is it good idea to still use data?
sometimes changing in place will cause raising auto-grad errors
seems to me like tensor.data.clone()
is slightly more efficient than tensor.detach().clone()
, when we explicitly want cloning just the data.
similarly, when we want to send just the data to a device
a = tensor.data.to(device)
# vs
a = tensor.detach().to(device)
# vs*
with torch.no_grad():
a = tensor.to(device) # doesn't it send the grad too?
can you elaborate on the side effects you know?
.data
in that class that refers to the underlying Tensor.when is it good idea to still use data?
For 99.9% of the users: never.
The only case where it is at the moment (until we provide an API for it) is to go around the inplace correctness checks when they are too strict and you know exactly what you’re doing.
seems to me like
tensor.data.clone()
is slightly more efficient thantensor.detach().clone()
What do you mean by more efficient?
If you mean faster, then not really. Both do a shallow copy of the Tensor. .detach()
might be imperceptibly faster as it does not need to recreate inplace correctness tracking metadata.
For sending just the content to a new device: a = tensor.detach().to(device)
will do the trick.
Thanks, this is very informative. Indeed I use data to overcome inplace correctness checks.
Didn’t know data does a shallow copy.
You say we can just avoid the correctness checks under torch.no_grad?
One thing I want to fully understand, the aliasing you mentioned in nn.Parameter
a = torch.randn(1)
b = nn.Parameter(a)
When we do b.data
is it exactly like calling a.data
? Is that what you mean? is it a problem?
You say we can just avoid the correctness checks under torch.no_grad?
No, you only ignore these ops for gradient computation.
Avoiding correctness checks is the last “valid” use case for .data
in the sense that we don’t have another way to do this yet (be careful though! You can get wrong gradients by doing so!!)
When we do
b.data
is it exactly like callinga.data
?
I an not 100% sure of what the behavior is now that Variables don’t exist. But I would avoid it
But yes I think the two are the same. And so have the same limitations as .data
in general.
Hi @albanD,
I don’t understand what are the main differences between calling a tensor.data
and tensor.detach()
.
Could you please help me out here?
My understanding is just this:
Both the methods are removing the tensor from the computational graph.
But I don’t understand the difference the two methods provide.
Thanks.
Hi,
The difference is that one does it in a way that the autograd knows about and the other (.data) hides it from the autograd.
This means that many of the sanity checks that the autograd does to ensure it always return correct gradients won’t be able to run properly if you use .data.