Data type when change tensor array to numpy array

yubeic · November 10, 2017, 10:20pm

I’m new to Pytorch, suggested by my peers.

I’m a little confused about the data structure on the conversion from a tensor array to numpy array. The following is a conversion:

test0 = torch.rand(100, 100)
test1 = test0.numpy()
print(sys.getsizeof(test1))

This gives me an output 112, where I expect 80112. Can anyone help me understand this outcome?

Thanks a lot!

jmandivarapu1 · November 10, 2017, 10:57pm

What you mean exactly
sys.getsizeof(test1) - Returns the size of object in bytes.
do you want to check the size in memory took by the data structure?

lantiga · November 11, 2017, 12:14am

For reference, calling numpy on a cpu tensor returns an array that shares the same buffer with the tensor. No data is copied, type is the same.

getsizeof internally calls the __sizeof__ method on the object, then it adds the garbage collection overhead to it and returns. Apparently __sizeof__ on PyTorch tensors doesn’t take the size of the underlying buffer into account:

torch.ones(10,10).__sizeof__()  # 40
torch.ones(100,100).__sizeof__()  # 40
np.ones((10,10)).__sizeof__() # 912
np.ones((100,100)).__sizeof__() # 80112

@smth is this something that needs fixing at the ATen level?

smth · November 11, 2017, 4:21pm

yea it seems worth fixing.

lantiga · November 12, 2017, 12:17pm

Right, I’ll open an issue