Hi all,

I have noticed that the actual GPU memory consumed by large tensors is more than what is theoretically expected.

Example:

For instance, a Float32 tensor with 2.900.000 elements (shape `(2.900.000, 1)`

) occupies approximately 12.288 KiB of GPU memory. However, based on my calculations (2.900.000 elements * 4 bytes), the expected memory usage should be around 11.328,125 KiB.

Here’s the code that reproduces the pattern:

```
import torch
# Specify the size of the tensor
tensor_size = (2900000, 1)
# Create a random tensor
random_tensor = torch.rand(tensor_size)
# Expected size
expected_size = tensor_size[0] * tensor_size[1] * random_tensor.element_size() / 1024
# Get actual occupied memory
random_tensor = random_tensor.to("cuda")
actual_size = torch.cuda.memory_allocated()/1024
# Print results
print(f"Expected occupied GPU memory by tensor: {expected_size} KiB")
print(f"Actual occupied GPU memory by tensor: {actual_size} KiB")
```

Is there any explanation for that?

Thank you very much!