How to know the memory allocated for a tensor on gpu?

How can we know what is the total memory allocated for a tensor on gpu? All the below statements return 72. Looks like I am missing something?

print(sys.getsizeof(torch.FloatTensor([0.5, 0.7])))
print(sys.getsizeof(torch.FloatTensor([0.5, 0.7]).cuda()))

Or is it safe to calculate that if a float tensor is on gpu, then the memory consumed by the tensor in total is 4 bytes * length_of_tensor ?

It will be also useful to know how to calculate the memory consumed for any object in gpu. Eg:


sys.getsizeof() will return the size of the python object. It will the same for all tensors as all tensors are a python object containing a tensor.
For each tensor, you have a method element_size() that will give you the size of one element in byte. And a function nelement() that returns the number of elements.
So the size of a tensor a in memory (cpu memory for a cpu tensor and gpu memory for a gpu tensor) is a.element_size() * a.nelement().

All objects are store in cpu memory. The only thing that can be using GPU memory are tensors (from all pytorch objects). So the gpu memory used by whatever object is the memory used by the tensors on the gpu that it contains.


Thank you for the detailed reply @albanD! Regarding the other objects in the gpu, so are you saying that the functions which tracks the graph for each tensors etc are stored in cpu?

Yes, they don’t actually need to be on GPU (and can’t). Only Tensors need to be there to perform operations on them.

1 Like

I have a followup question: How do you determine how much memory the computational graph of
a tensor occupies in memory? This is important to know sometimes for memory-bottlenecked networks as one can move the network to a less memory hungry model (if such model exists) but with comparable performance (for example GRU instead of LSTM).

1 Like

Does nelement take into account duplicated memory in a view object? If I understand correctly the memory requirements for img and img_3_dims should be the same!

I.e. this this code

img.shape, type(img)
(torch.Size([256, 256]), <class 'torch.Tensor'>)
img.element_size(), img.nelement()
(4, 65536)
img_3_dims = img.view(img.shape + (1,)).expand(-1,-1,3)
img_3_dims.element_size(), img_3_dims.nelement()
(4, 196608)


The formula above only works for contiguous Tensors.
If you start to play with strides (what expand does) or slicing, it will be come very complex to know the memory usage. But such Tensors should be fairly rare in regular applications.

Thanks for clarification @albanD

I used the line of code below to convert a gray-scale image to a 3-channel image so I can use my data with a pretrained model (for RGB input), without using triple the memory. I was hoping to confirm I was saving memory using view() and expand(), but I can’t find a way to explicitly measure this? Otherwise, I’ll probably just assume it’s doing what I think :slight_smile:

img_3_dims = img.view(img.shape + (1,)).expand(-1,-1,3)

Both expand and view are never allocating memory. So if you only use these, you can be sure that you don’t use extra memory.

1 Like