I came across this quirky thing when playing around with sparse tensors where, a sparse tensor takes lot more memory than a corresponding dense tensor.
Example Code
import torch
from pytorch_memlab import MemReporter
DEVICE = 'cuda:0'
n = 1000
c = .6
t1 = torch.randn(100, n, n).to(DEVICE)
t1[torch.rand_like(t1) > c] = 0
t2 = t1.to_sparse()
torch.cuda.empty_cache()
reporter = MemReporter()
reporter.report()
Output
Element type Size Used MEM
-------------------------------------------------------------------------------
Storage on cuda:0
Tensor0 (100, 1000, 1000) 381.47M
Tensor2 (3, 60000715) 1.34G
Tensor2 (60000715,) 228.88M
-------------------------------------------------------------------------------
Total Tensors: 340002860 Used Memory: 1.94G
The allocated memory on cuda:0: 1.94G
Memory differs due to the matrix alignment or invisible gradient buffer tensors
-------------------------------------------------------------------------------
t2
occupies around 1.5GB compared to t1
's 380MB.
Device Info
Pytorch Version: 1.7.1
CUDA Version: 11.0
Device: GeForce RTX 2070
Is this a bug in the sparse memory format?
I have a large sparse tensor (~ 30% sparsity) and I would like to reduce the GPU memory usage. Any help in reducing memory usage is appreciated.