Padding a list of sparse tensors

I have a list of csr sparse matrices:

X:
[<9x15466 sparse matrix of type '<class 'numpy.float64'>'
 	with 115 stored elements in Compressed Sparse Row format>,
 <201x15466 sparse matrix of type '<class 'numpy.float64'>'
 	with 4430 stored elements in Compressed Sparse Row format>,
 <98x15466 sparse matrix of type '<class 'numpy.float64'>'
 	with 2567 stored elements in Compressed Sparse Row format>,
 <24x15466 sparse matrix of type '<class 'numpy.float64'>'
 	with 368 stored elements in Compressed Sparse Row format>,
 <55x15466 sparse matrix of type '<class 'numpy.float64'>'
 	with 904 stored elements in Compressed Sparse Row format>]

Which I convert to torch tensors
X = sp.vstack(X).tocoo()
X = sparse_coo_to_tensor(X)

X:
tensor(indices=tensor([[    0,     0,     0,  ...,   386,   386,   386],
                       [ 1738,  1740,  3404,  ..., 15457, 15461, 15465]]),
       values=tensor([1., 1., 1.,  ..., 1., 1., 1.]),
       size=(387, 15466), nnz=8384, layout=torch.sparse_coo)

X[0]:
tensor(indices=tensor([[ 1738,  1740,  3404,  3405, 15427, 15431, 15436, 15437,
                        15445, 15448, 15453, 15457, 15464, 15465]]),
       values=tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]),
       size=(15466,), nnz=14, layout=torch.sparse_coo)

I want to be able to use pad_sequence in my collate_fn(), however, pad_sequence doesn’t seem to work with torch.sparse_coo layout

torch.nn.utils.rnn.pad_sequence(X)

D:\bo\envs\bd\lib\site-packages\torch\nn\utils\rnn.py in
pad_sequence(sequences, batch_first, padding_value)
365 out_dims = (max_len, len(sequences)) + trailing_dims
366
→ 367 out_tensor = sequences[0].new_full(out_dims, padding_value)
368 for i, tensor in enumerate(sequences):
369 length = tensor.size(0)

RuntimeError: full(…) is not implemented for sparse layout

Does anyone have any experience in working with padding sparse tensors and making it work for models such as RNNs, Transformers etc? Do I have to convert them to dense tensors? My motivation of using these sparse tensors in the first place is to reduce the OOM errors as I have a lot of very high dimensional feature vectors. Thanks!