Any way to check if two tensors have the same base

Intel_Novel · May 3, 2019, 8:37pm

OK, here is the example.

x = torch.randn(4, 4)
y = x.view(2,-1)

How can I make sure y has the same origin and different metadata comparing to the x, and there was no clone() operation involved, like there would be on:

x = torch.randn(4, 4)
y = x.clone().view(2,-1)

or in case on reshape() non contiguous one.

LeviViana · May 3, 2019, 10:45pm

I’m not sure I fully understood your question, but I’ll try to answer:

import torch
x = torch.randn(4, 4)
y = x.view(2,-1)
print(x.data_ptr() == y.data_ptr()) # prints True
y = x.clone().view(2,-1)
print(x.data_ptr() == y.data_ptr()) # prints False

But it doesn’t work if you are interested in comparing tensor storage. For instance, in the example below, x and y share storage, but they don’t share the same data pointer:

import torch
x = torch.arange(10)
y = x[1::2]
print(x.data_ptr() == y.data_ptr()) # prints False

The following snippet checks whether two tensors share storage, though, I guess there might be a more efficient way of doing this.

import torch
def same_storage(x, y):
	x_ptrs = set(e.data_ptr() for e in x.view(-1))
	y_ptrs = set(e.data_ptr() for e in y.view(-1))
	return (x_ptrs <= y_ptrs) or (y_ptrs <= x_ptrs)

x = torch.arange(10)
y = x[1::2]
print(same_storage(x, y)) # prints True
z = y.clone()
print(same_storage(x, z)) # prints False
print(same_storage(y, z)) # prints False

MariosOreo · May 4, 2019, 1:43am

Hello @LeviViana,

I found that .data_ptr() checks whether the tensors have the same memory address of the first element. What about X is Y, it also checks whether the tensors have the same memory address (if I understand clearly).

In the first code snippet, x and y should share the same memory, but why x.data_ptr() == y.data_ptr() and x is y returns False?

Could you give me some ideas?

x.data_ptr()
94522278737584
id(x)
140148138888576
y.data_ptr()
94522278737584
id(y)
140152047258360

Intel_Novel · May 4, 2019, 3:01pm

import torch
x=torch.arange(10)
y=x.view(-1)
z=x.reshape(2,5)
c=x.clone()
d=x[1::2] 
print(x, x.data_ptr())
print(y, y.data_ptr())
print(z, z.data_ptr())
print(c, c.data_ptr())
print(d, d.data_ptr())
print("...")

print(id(x), id(x.data))
print(id(y), id(y.data))
print(id(z), id(z.data))
print(id(c), id(c.data))
print(id(d), id(d.data))

Returns:

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 73968896
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 73968896
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]]) 73968896
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 73969024
tensor([1, 3, 5, 7, 9]) 73968904
...
139879908757576 139879908721648
139879908724240 139879908721648
139879908723736 139879908721648
139879908722440 139879908721648
139879908721936 139879909096904

data_ptr() was something I was looking for. For any variable var, id(var) returns different result, and id(var.data) returns the same location id almost always, but not always.

Intel_Novel · May 4, 2019, 3:48pm

Pytorch is still young. I hoped the storage object may provide me hint for “same storage” idea, but it will not.

import torch
def same_storage(x, y):    
    return x.storage is y.storage 

x = torch.arange(10)
y = x[1::2]
z = y.clone()
print(same_storage(x, y)) # prints False
print(same_storage(x, z)) # prints False
print(same_storage(y, z)) # prints False

From docs.python.org:

“Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory. The ‘is’ operator compares the identity of two objects; the id() function returns an integer representing its identity.”

I tested this on CPU, but apparently the problem is maybe because PyTorch uses C and storage objects actually are pointers to memory location handled in C. So if we can use Python to get these memory locations…

LeviViana · May 4, 2019, 9:42pm

Hello @MariosOreo,

It seems to me that x and y should be treated as different objects. The reason they share storage is for performance (saving memory & not copying the original tensor).

Moreover, the is operator is very restrictive. For instance, [] is [] returns False, whereas [] == [] returns True. In the first case python tests whether two different empty lists are the same object, which by construction is false. In the second case, python tests whether two list objects have the same value, which is true. Back to our case, I think now it is clear why x is y returns False.

kytabyte · September 26, 2019, 3:41pm

@LeviViana

Your solution is great in check whether a tensor is a subset of another tensor. However, it still has some limitation in checking the case if two different sub-tensors pointing to the same storage.

import torch

t = torch.rand((3,3))
t1 = t[0,:]
t2 = t[:,0]

same_storage(t1, t2)

>> False

However, if I change t1[0], t2 will change as well.

LeviViana · September 26, 2019, 4:02pm

For a more general case:

import torch
def same_storage(x, y):
    return x.storage().data_ptr() == y.storage().data_ptr()

# It works on your test.
t = torch.rand((3,3))
t1 = t[0,:]
t2 = t[:,0]

print(same_storage(t1, t2)) # prints True
 
x = torch.arange(10)
y = x[1::2]
print(same_storage(x, y)) # prints True
z = y.clone()
print(same_storage(x, z)) # prints False
print(same_storage(y, z)) # prints False

dejanbatanjac · September 22, 2020, 7:14pm

@albanD can you confirm that x.storage().data_ptr()==y.storage().data_ptr() is the correct way to check if two tensors x, and y share the same storage?

albanD · September 22, 2020, 7:43pm

Hey,

While it does work, I am not sure I would ever recommend to use it
If you need to do this check, you are most likely doing something very weird. And you should understand what the storage is in this case to know what you’re doing.

dejanbatanjac · September 22, 2020, 9:35pm

@albanD keep with the good work. Thanks.

LLlearner · March 6, 2023, 12:33pm

@albanD Why is the var.storage().data_ptr() sometimes different from var.data_ptr()? What is other than the “storage” wrapped in a tensor to make the memory address different? And why is id(x) different from x.data_ptr(). Isn’t it supposed to be the same?
Thanks in advance

albanD · March 10, 2023, 4:56pm

Hi,

var.data_ptr() will point to the beginning of the data used by the Tensor. This might not be the beginning of the storage. You can check var.storage_offset() to see the offset in question.

id(x) is python giving you a unique id for the python object x. This has nothing to do with the storage for a Tensor.
If we were using the data_ptr() for this, you can see that this id would very quickly not be unique as soon as two Tensors look at the same memory.

ado_sar · July 6, 2024, 6:36pm

Is there something like np.shares_memory?