What happened to a view of a tensor when the original Tensor is deleted?

samsja · December 1, 2022, 2:32pm

Hey all. My question is in the title:

What happened to a view of a tensor when the original Tensor is deleted? It seems that the view still hold its memory but does it hold a reference to the full tensor or only the part that concern it ?

Let me gave you an example

import torch
a = torch.zeros(10 ,100)
b  = a[0]

del a

print(b) 
>>> torch.tensor([[

If I understand correctly b share its data with a, they are no copy. So what happened if I delete a ?

Is everything is deleted expect the data shared by a[0] and b ?

Is a actually still in memory but just the reference to it is deleted ? Does that mean that until I delete b (or python gc delete it) I will carry all of the weight of a ?

Thanks in advance for you answers

KFrank · December 1, 2022, 8:14pm

Hi Samsja!

Nothing.

The “view” holds a reference to all of the data that comprised the original
tensor.

We can use Tensors’ .storage() method to probe its “storage” and
_TypedStorage’s .data_ptr() method to see the actual location of
the data itself.

Consider:

>>> import torch
>>> print (torch.__version__)
1.13.0
>>>
>>> a = torch.zeros (2, 3)
>>> b = a[0]
>>>
>>> a
tensor([[0., 0., 0.],
        [0., 0., 0.]])
>>> b
tensor([0., 0., 0.])
>>>
>>> a.storage()
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]
>>> b.storage()
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]
>>> id (a.storage().data_ptr())
2253513833008
>>> id (b.storage().data_ptr())
2253513833040
>>> a.storage().data_ptr()
2253469255680
>>> b.storage().data_ptr()
2253469255680
>>>
>>> del a
>>>
>>> a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>>
>>> b
tensor([0., 0., 0.])
>>> b.storage()
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
[torch.storage.TypedStorage(dtype=torch.float32, device=cpu) of size 6]
>>> b.storage().data_ptr()
2253469255680

When you run (the python command) del a, the reference a is deleted
(but other references to the same object – should they exist – are not
affected) and the name a is removed from whatever the relevant python
namespace is.

In this case the Tensor referred to by a only had one reference to it (a
itself), so that specific Tensor object no longer exists. But things that may
have been referred to by that Tensor will still exist if other things refers to
them. The _TypedStorage object referred to by a.storage() will also be
deleted, as it is not the same _TypedStorage object referred to by b.

But the underlying data, pointed to by both a.storage().data_ptr() and
b.storage().data_ptr(), is not deleted, and, as illustrated in the above
example, still contains all of the data underlying the tensor a, even though
only uses a portion of it.

So, even though b only uses part of a’s underlying data, all of a’s data
still exists, taking up what could be (for larger tensors) a significant amount
of memory. If you want del a to free up a’s data, you would want to do
something like b = a[0].clone() (and then call del a).

Best.

K. Frank

samsja · December 2, 2022, 10:01am

Thank you very much for the very detailed explanation this is very useful.

So that’s what I “fear” is that you might extract an element of a bigger tensor (lets say in a retrieval task) and if you don’t explicitly copy the data to a smaller storage you will always carry the bigger tensor in memory.