Creating a sparse tensor from CSR Matrix

jbarrow · February 14, 2018, 7:43pm

Is there a straightforward way to go from a scipy.sparse.csr_matrix (the kind returned by an sklearn CountVectorizer) to a torch.sparse.FloatTensor?

Currently, I’m just using torch.from_numpy(X.todense()), but for large vocabularies that eats up quite a bit of RAM.

richard · February 14, 2018, 11:14pm

You could convert a csr format matrix to coo, and then process that a little before sending it into the sparse tensor constructor. I believe scipy’s coo format looks similar to pytorch’s sparse tensors.

BrendanMartin · March 9, 2018, 4:13am

@jbarrow @richard I’m attempting to solve the same problem, but I’m getting a little lost after converting to COO format. I’m not sure which attributes of the new matrix to pass to the sparse tensor constructor.

richard · March 9, 2018, 3:09pm

The sparse tensor constructor is:
torch.sparse.FloatTensor(indices, values, size).
an example can be found here: http://pytorch.org/docs/master/sparse.html?highlight=sparse%20tensor

bcol · April 21, 2019, 12:40pm

I find the answer here.

import torch
import numpy as np
from scipy.sparse import coo_matrix

coo = coo_matrix(([3,4,5], ([0,1,1], [2,0,2])), shape=(2,3))

values = coo.data
indices = np.vstack((coo.row, coo.col))

i = torch.LongTensor(indices)
v = torch.FloatTensor(values)
shape = coo.shape

torch.sparse.FloatTensor(i, v, torch.Size(shape)).to_dense()

Output

0 0 3
4 0 5
[torch.FloatTensor of size 2x3]