Any way to check if two tensors have the same base

OK, here is the example.

x = torch.randn(4, 4)
y = x.view(2,-1)

How can I make sure y has the same origin and different metadata comparing to the x, and there was no clone() operation involved, like there would be on:

x = torch.randn(4, 4)
y = x.clone().view(2,-1)

or in case on reshape() non contiguous one.

1 Like

Iā€™m not sure I fully understood your question, but Iā€™ll try to answer:

import torch
x = torch.randn(4, 4)
y = x.view(2,-1)
print(x.data_ptr() == y.data_ptr()) # prints True
y = x.clone().view(2,-1)
print(x.data_ptr() == y.data_ptr()) # prints False

But it doesnā€™t work if you are interested in comparing tensor storage. For instance, in the example below, x and y share storage, but they donā€™t share the same data pointer:

import torch
x = torch.arange(10)
y = x[1::2]
print(x.data_ptr() == y.data_ptr()) # prints False

The following snippet checks whether two tensors share storage, though, I guess there might be a more efficient way of doing this.

import torch
def same_storage(x, y):
	x_ptrs = set(e.data_ptr() for e in x.view(-1))
	y_ptrs = set(e.data_ptr() for e in y.view(-1))
	return (x_ptrs <= y_ptrs) or (y_ptrs <= x_ptrs)

x = torch.arange(10)
y = x[1::2]
print(same_storage(x, y)) # prints True
z = y.clone()
print(same_storage(x, z)) # prints False
print(same_storage(y, z)) # prints False
7 Likes

Hello @LeviViana,

I found that .data_ptr() checks whether the tensors have the same memory address of the first element. What about X is Y, it also checks whether the tensors have the same memory address (if I understand clearly).

In the first code snippet, x and y should share the same memory, but why x.data_ptr() == y.data_ptr() and x is y returns False?

Could you give me some ideas?

x.data_ptr()
94522278737584
id(x)
140148138888576
y.data_ptr()
94522278737584
id(y)
140152047258360
2 Likes
import torch
x=torch.arange(10)
y=x.view(-1)
z=x.reshape(2,5)
c=x.clone()
d=x[1::2] 
print(x, x.data_ptr())
print(y, y.data_ptr())
print(z, z.data_ptr())
print(c, c.data_ptr())
print(d, d.data_ptr())
print("...")

print(id(x), id(x.data))
print(id(y), id(y.data))
print(id(z), id(z.data))
print(id(c), id(c.data))
print(id(d), id(d.data))

Returns:

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 73968896
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 73968896
tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]]) 73968896
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 73969024
tensor([1, 3, 5, 7, 9]) 73968904
...
139879908757576 139879908721648
139879908724240 139879908721648
139879908723736 139879908721648
139879908722440 139879908721648
139879908721936 139879909096904

data_ptr() was something I was looking for. For any variable var, id(var) returns different result, and id(var.data) returns the same location id almost always, but not always.

Pytorch is still young. I hoped the storage object may provide me hint for ā€œsame storageā€ idea, but it will not.

import torch
def same_storage(x, y):    
    return x.storage is y.storage 

x = torch.arange(10)
y = x[1::2]
z = y.clone()
print(same_storage(x, y)) # prints False
print(same_storage(x, z)) # prints False
print(same_storage(y, z)) # prints False

From docs.python.org:

ā€œEvery object has an identity, a type and a value. An objectā€™s identity never changes once it has been created; you may think of it as the objectā€™s address in memory. The ā€˜isā€™ operator compares the identity of two objects; the id() function returns an integer representing its identity.ā€

I tested this on CPU, but apparently the problem is maybe because PyTorch uses C and storage objects actually are pointers to memory location handled in C. So if we can use Python to get these memory locationsā€¦

1 Like

Hello @MariosOreo,

It seems to me that x and y should be treated as different objects. The reason they share storage is for performance (saving memory & not copying the original tensor).

Moreover, the is operator is very restrictive. For instance, [] is [] returns False, whereas [] == [] returns True. In the first case python tests whether two different empty lists are the same object, which by construction is false. In the second case, python tests whether two list objects have the same value, which is true. Back to our case, I think now it is clear why x is y returns False.

1 Like

@LeviViana

Your solution is great in check whether a tensor is a subset of another tensor. However, it still has some limitation in checking the case if two different sub-tensors pointing to the same storage.

import torch

t = torch.rand((3,3))
t1 = t[0,:]
t2 = t[:,0]

same_storage(t1, t2)

>> False

However, if I change t1[0], t2 will change as well.

For a more general case:

import torch
def same_storage(x, y):
    return x.storage().data_ptr() == y.storage().data_ptr()

# It works on your test.
t = torch.rand((3,3))
t1 = t[0,:]
t2 = t[:,0]

print(same_storage(t1, t2)) # prints True
 
x = torch.arange(10)
y = x[1::2]
print(same_storage(x, y)) # prints True
z = y.clone()
print(same_storage(x, z)) # prints False
print(same_storage(y, z)) # prints False

@albanD can you confirm that x.storage().data_ptr()==y.storage().data_ptr() is the correct way to check if two tensors x, and y share the same storage?

Hey,

While it does work, I am not sure I would ever recommend to use it :smiley:
If you need to do this check, you are most likely doing something very weird. And you should understand what the storage is in this case to know what youā€™re doing.

1 Like

@albanD keep with the good work. Thanks.

@albanD Why is the var.storage().data_ptr() sometimes different from var.data_ptr()? What is other than the ā€œstorageā€ wrapped in a tensor to make the memory address different? And why is id(x) different from x.data_ptr(). Isnā€™t it supposed to be the same?
Thanks in advance

Hi,

var.data_ptr() will point to the beginning of the data used by the Tensor. This might not be the beginning of the storage. You can check var.storage_offset() to see the offset in question.

id(x) is python giving you a unique id for the python object x. This has nothing to do with the storage for a Tensor.
If we were using the data_ptr() for this, you can see that this id would very quickly not be unique as soon as two Tensors look at the same memory.