When I computed the gradient of a sparse tensor on GPU, I got out of memory issue.
I need a tensor of size
[1, 3, 224 * 224, 224 * 224] which only has 1 * 3 * 224 * 224 * 25 nonzero entries. So I store it in sparse_coo format.
I take the summation over the last dim, then I get a tensor of size
[1, 3, 224 * 224]:
b = torch.sparse.sum(a, dim=3).to_dense()
I reshape the tensor into size
[1, 3, 224, 224]:
b = b.view(1, 3, 224, 224)
and then feed it into a neural network.
The memory that required to excute this computation should be very small. The forward pass is fine. However, I encouter an error during backward
RuntimeError: CUDA out of memory. Tried to allocate 28.14 GiB
It seems like pytorch creates some tensor of size
[1, 3, 224 * 224, 224 * 224] during backward, which exactly consume that amount of memory.
I am wondering whether I misuse the sparse tensor functionality such that autodiff construct the full dense matrix explictly during backprop?