Is there a straightforward way to go from a
scipy.sparse.csr_matrix (the kind returned by an sklearn
CountVectorizer) to a
Currently, I’m just using
torch.from_numpy(X.todense()), but for large vocabularies that eats up quite a bit of RAM.
You could convert a
csr format matrix to
coo, and then process that a little before sending it into the sparse tensor constructor. I believe scipy’s
coo format looks similar to pytorch’s sparse tensors.
@jbarrow @richard I’m attempting to solve the same problem, but I’m getting a little lost after converting to COO format. I’m not sure which attributes of the new matrix to pass to the sparse tensor constructor.
The sparse tensor constructor is:
torch.sparse.FloatTensor(indices, values, size).
an example can be found here: http://pytorch.org/docs/master/sparse.html?highlight=sparse%20tensor
I find the answer here.
import numpy as np
from scipy.sparse import coo_matrix
coo = coo_matrix(([3,4,5], ([0,1,1], [2,0,2])), shape=(2,3))
values = coo.data
indices = np.vstack((coo.row, coo.col))
i = torch.LongTensor(indices)
v = torch.FloatTensor(values)
shape = coo.shape
torch.sparse.FloatTensor(i, v, torch.Size(shape)).to_dense()
0 0 3
4 0 5
[torch.FloatTensor of size 2x3]