Discrepancy Between Expected and Actual GPU Memory Usage for Large Tensors

Hi all,

I have noticed that the actual GPU memory consumed by large tensors is more than what is theoretically expected.

Example:

For instance, a Float32 tensor with 2.900.000 elements (shape (2.900.000, 1)) occupies approximately 12.288 KiB of GPU memory. However, based on my calculations (2.900.000 elements * 4 bytes), the expected memory usage should be around 11.328,125 KiB.

Here’s the code that reproduces the pattern:

import torch

# Specify the size of the tensor
tensor_size = (2900000, 1)
  
# Create a random tensor
random_tensor = torch.rand(tensor_size)
  
# Expected size
expected_size = tensor_size[0] * tensor_size[1] * random_tensor.element_size() / 1024
  
# Get actual occupied memory
random_tensor = random_tensor.to("cuda")
actual_size = torch.cuda.memory_allocated()/1024

# Print results
print(f"Expected occupied GPU memory by tensor: {expected_size} KiB")
print(f"Actual occupied GPU memory by tensor: {actual_size} KiB")

Is there any explanation for that?

Thank you very much!