Sparse tensors consume more memory than dense tensors

I came across this quirky thing when playing around with sparse tensors where, a sparse tensor takes lot more memory than a corresponding dense tensor.

Example Code

import torch
from pytorch_memlab import MemReporter

DEVICE = 'cuda:0'

n = 1000
c = .6

t1 = torch.randn(100, n, n).to(DEVICE)
t1[torch.rand_like(t1) > c] = 0
t2 = t1.to_sparse()
torch.cuda.empty_cache()

reporter = MemReporter()
reporter.report()

Output

Element type                                            Size  Used MEM
-------------------------------------------------------------------------------
Storage on cuda:0
Tensor0                                    (100, 1000, 1000)   381.47M
Tensor2                                        (3, 60000715)     1.34G
Tensor2                                          (60000715,)   228.88M
-------------------------------------------------------------------------------
Total Tensors: 340002860 	Used Memory: 1.94G
The allocated memory on cuda:0: 1.94G
Memory differs due to the matrix alignment or invisible gradient buffer tensors
-------------------------------------------------------------------------------

t2 occupies around 1.5GB compared to t1's 380MB.

Device Info

Pytorch Version: 1.7.1
CUDA Version: 11.0
Device: GeForce RTX 2070

Is this a bug in the sparse memory format?
I have a large sparse tensor (~ 30% sparsity) and I would like to reduce the GPU memory usage. Any help in reducing memory usage is appreciated.

I think the memory footprint is expected as described here.
For your sparsity, this would be the memory usage:

DEVICE = 'cuda:0'

n = 100
c = .6

t1 = torch.randn(100, n, n).to('cuda')
t1[torch.rand_like(t1) > c] = 0
t2 = t1.to_sparse()

print((t2.indices().nelement() * 8 + t2.values().nelement() * 4) / 1024**2)

Note that I reduced the number of elements, but it should be the same relative usage compared to the dense tensor.