I recently stumbled upon something I don’t understand: When creating a float-tensor on GPU with only 1 element, I would assume it to take up 4 bytes of memory. However, torch.cuda.max_memory_allocated() returns 512 bytes? Why is that?
MWE to replicate:
import torch
print(torch.cuda.max_memory_allocated())
a = torch.tensor(1.0, device='cuda')
print(torch.cuda.max_memory_allocated()) # 512 bytes
print(torch.cuda.max_memory_reserved()) # 2097152 bytes = 2 MB
I’m aware that the CUDA context must be created on the GPU as well, which is why nvidia-smi shows values much higher than 0.5MB, around 1GB, but that’s not what I’m asking here. Why does PyTorch reserve half a MB for a 4byte tensor, and why does it cache 2MB when creating said tensor?
Following this answer on Stackoverflow, I did
import sys
print(sys.getsizeof(a)) # 64
print(sys.getsizeof(a.storage()) # 60
which unfortunately is even more confusing. Anybody knows what’s going on here?