Matrix multiplication with values only at masked entries

hjain · December 21, 2017, 2:26pm

I want to multiply two dense matrices A(1000000,300) and B(1000000,300). The resulting matrix will be of size million x million, which offcourse won’t fit into memory and will take forever to compute. However, I want values only at few elements which are specified in a sparse matrix C(1000000,1000000).
Is there an efficient(possibly that utilizes GPUs) way to do this on Pytorch?

munkiti · December 21, 2017, 2:32pm

Did you try just iterating over the required elements in C matrix and take dot product of corresponding row of A and column of B? Of-course i believe there can more efficient solutions, but i think this is the most basic one.

While iterating you can also use multiprocessing to run the computations on various GPU’s for various required elements in C.

hjain · December 22, 2017, 11:49am

yes but that takes a lot of time. I wanted a vectorized solution that uses inbuilt functions

richard · December 22, 2017, 5:26pm

There’s a way to do this if we restrict your mask M:
Let’s say you have a mask M such that no two 1’s in the mask are in the same row or column.

Then the following should equivalent to (z @ y) * M, where the @ sign is matrix multiplication:
(z.t() * (y @ M.t())).to_dense().sum(dim=0).diag() @ M

I think pytorch does support sparse x dense -> sparse via torch.mm. If you need a dense x sparse -> sparse (because M will probably be sparse), you can use the identity AB = ( AB )^T ^T = (B^T A^T)^T.

I haven’t actually tried out the expression above for your use case, though.

hjain · December 22, 2017, 6:42pm

Thanks a lot Richard! Unfortunately, the mask I have contains multiple ones in each row. For instance for an M of size million x million there are about 300 ones in each row.
Is there any solution for this?

hjain · February 18, 2018, 7:16pm

Any more suggestions to solve this??

tom · February 18, 2018, 8:08pm

If you have your mask 1 locations as a rows and columns LongTensors, so for example (your matrices are larger, but for demonstration purposes)

A = torch.randn(5,10)
B = torch.randn(8,10)
rows = torch.arange(0, 16, out=torch.LongTensor())%5
cols = (torch.arange(0, 16, out=torch.LongTensor())/2)%8
idxes = torch.stack([rows,cols], dim=0)
mask = torch.sparse.FloatTensor(idxes, torch.ones(16)).to_dense()

how about doing

vals = (A[rows]*B[cols]).sum(1)
res = torch.sparse.FloatTensor(idxes, vals)

Then you see that the difference to the masked product is 0 up to machine precision:

print((res.to_dense()-(A@B.t()) * mask).abs().max())

Now this will have quite a bit of memory overhead, but you could split it into chunks of nonzero entries and cat them together afterwards if needed.

Best regards

Thomas

shenshanf · May 3, 2023, 2:27am

I implemented what you said using numba cuda. see：GitHub - shenshanf/matmul_with_mask: matrix multiplication with sparse mask that indicate the operation position

shenshanf · August 25, 2023, 5:52pm

see this api function: torch.sparse.sampled_addmm
https://pytorch.org/docs/stable/generated/torch.sparse.sampled_addmm.html#torch.sparse.sampled_addmm