from torch.nn import Linear, Parameter
import torch
layer = Parameter(torch.arange(4.))
y = layer.data
z = layer.data
print(id(y))
print(id(z))
print(id(layer.data))
z[0] = 100
print(layer)
print('Is z a view?', z._is_view())
the output is:
132203146695440
132203146810368
132203146811088
Parameter containing:
tensor([100., 1., 2., 3.], requires_grad=True)
Is z a view? False
Can someone explain this output? All of z, y, layer.data are different objects so I was thinking that the accessing layer.data returns views, which is not the case as shown in the last print.
.data has been deprecated for public use since at least one major revision ago. It is
still used internally by pytorch, so it is still around. Using .data can break how autograd
tracks gradients, so you should avoid using it. (I think you can access .data, but you
shouldn’t use it to modify the contents of a tensor.)
Parameter is a subclass of Tensor,and I’m pretty sure that Parameter.data gets
delegated to Tensor.data without any modification. However, .data appears to
return some sort of proxy object (that gets created on the fly). Even though .data looks
like a “property,” python supports things that look like properties but that actually invoke
methods under the hood.
When you call layer.data, a new proxy object gets created, but if you call layer.data
again, the new proxy object may or may not reside in the same location in memory,
depending whether the previous proxy object had been freed (typically because it had
no references bound to it such as your y or z).
Try:
for _ in range (10):
print (id (layer.data))
vs.
l = []
for _ in range (10):
print (id (layer.data))
l.append (torch.ones (1)) # use up some locations in memory
The point is that id (layer.data) is not telling you where the actual “data” resides in
memory; it’s telling you where the proxy object – which was created on the fly – resides
in memory.
Such a proxy object is in some ways like a view – it’s a “handle” to some stuff – but it’s
not considered an official pytorch view (perhaps because it doesn’t provide a transformed
view into the underlying stuff).
Note that because its public use is deprecated, how it works and what is does may change
unexpectedly, so I haven’t really tried to understand the details.
Thanks for the comment @KFrank ! It cleared up a lot of my confusion.
With regards to PyTorch views:
Such a proxy object is in some ways like a view – it’s a “handle” to some stuff – but it’s
not considered an official pytorch view (perhaps because it doesn’t provide a transformed
view into the underlying stuff).
If I change the way of checking for views (see the last line):
from torch.nn import Linear, Parameter
import torch
layer = Parameter(torch.arange(4.))
y = layer.data
z = layer.data
print(id(y))
print(id(z))
print(id(layer.data))
z[0] = 100
print(layer)
print('Is z a view?', z.untyped_storage().data_ptr() == y.untyped_storage().data_ptr())
the output is:
130347870622480
130347870737408
130347870738128
Parameter containing:
tensor([100., 1., 2., 3.], requires_grad=True)
Is z a view? True
So maybe tensor._is_view() is not a right way to check for views (contrary to its name)?
Another example (due to .detach()y shares the same storage as x):
>>> x = torch.arange(3.)
>>> y = x.detach()
>>> x._is_view(), y._is_view()
(False, False)
>>> y.untyped_storage().data_ptr() == x.untyped_storage().data_ptr()
True
By drilling down to .data_ptr(), you are showing that the two higher-level objects do,
ultimately, refer to the same underlying data.
But this isn’t what pytorch mean by “view.” Things like .reshape() or .permute() let
you look at the same underlying data, but reorganized somehow. (If, for example, you
wanted the permuted version of a tensor without using a view, you would have to make
a new copy of the tensor’s elements with their order shuffled. A view, in a sense, does
that shuffling “on the fly,” without the need to copy the elements.)
So a view does refer to the same underlying data, but reorganizes that underlying data
somehow (without copying it). Furthermore, views are tracked properly by autograd.
(Lower level things like .data and .data_ptr() aren’t tracked properly by autograd,
and if you use them, backpropagation won’t work properly.)
Consider:
>>> import torch
>>> torch.__version__
'2.6.0+cu126'
>>> t = torch.ones (1, requires_grad = True)
>>> v = t.reshape ((1,))
>>> d = t.data
>>> t
tensor([1.], requires_grad=True)
>>> v
tensor([1.], grad_fn=<ViewBackward0>)
>>> d
tensor([1.])
>>> t._is_view()
False
>>> v._is_view()
True
>>> d._is_view()
False