Better way to forward sparse matrix

I have a very sparse dataset that is organized as a scipy sparse csr_matrix and it is too large to convert it to a single dense numpy array. For now, I can only extract part of it and convert that part to an numpy array, then to a tensor and forward the tensor. But the csr_matrix to numpy array step is still awfully time-consuming. I wonder whether there is a better method to feed the sparse matrix.

There seems to be experimental support for sparse matrices in PyTorch. I’ve never used them before but maybe this will be helpful - torch.sparse

EDIT: You might want to have a look at this discussion on GitHub regarding the state of sparse tensors in PyTorch.

Thank you for your timely reply. I read torch.sparse in PyTorch documents before posting but wasn’t aware of the github discussion.

Right now I have a solution as below, which is quite fast:


def spy_sparse2torch_sparse(data):
    """

    :param data: a scipy sparse csr matrix
    :return: a sparse torch tensor
    """
    samples=data.shape[0]
    features=data.shape[1]
    values=data.data
    coo_data=data.tocoo()
    indices=torch.LongTensor([coo_data.row,coo_data.col])
    t=torch.sparse.FloatTensor(indices,torch.from_numpy(values).float(),[samples,features])
    return t

But it is still not very helpful. When I print(t[0]), it says RuntimeError: Sparse tensors do not have strides. Then how should I extract a minibatch of it?

The .to_dense() method is impossible because it returns RuntimeError: $ Torch: not enough memory: you tried to allocate 141GB. Buy new RAM! at /pytorch/aten/src/TH/THGeneral.c:218

I’m able to reproduce the same error when I run similar code on PyTorch 0.4.1. I can’t help you out here. Maybe @albanD or @smth have some insights.

Hi,

That should work.
At the moment, you cannot access elements of sparse tensors that way, you can access indices and values directly.
But you can still perform some pointwise operations on them and use them in matrix matrix multiplications.

What do you want to do with them? What are the operations your net needs to be able to do with them that are not available?

Thanks a lot!

I need to sample a mini-batch out of the whole dataset, feed a classifier that mini-batch and update the weights of the classifier. If mini-batch sampling is possible, I can finish the task.

My PyTorch version is 0.4.0. If there is some mechanism to do that, it will be great

Hi,

I am afraid functions like .index_select() are not available at the moment and you would need them to get a mini batch from your dataset.
You could potentially keep your array as a scipy one. Extract the minibatch from the scipy array (I expect this is possible but I don’t know). And then convert the minibatch to a torch (sparse) tensor just before feeding it to your net.

2 Likes

@11130 you could think of contributing to this thread - Sparse tensor use cases