Hi there, I am working on a program which needs to execute a matrix multiplication a vector. The size of the matrix is 2^{18} x 2 ^ {18}, and the size of the vector is, of cause, 2^{18} x 1. Due to memory limitations, I have used the sparse matrix. However, the computational speed is not satisfying, I’d like to know if there is a better option to handle this matrix

some features of my matrix are:

The sparsity of the matrix is about 99.5%

It’s a diagonal symmetric matrix.

it contains only 0 and 1.

it can be separated as 512 x 512 blocks, 15757 blocks contains non-zero element (the percentage of non-zero blocks is about 6%)

I’m using the COO format right now, obviously it’s not the best choice for this job. I am also consider to use the BSR(or BSC) format. I guess the BSR format will have advantages in terms of the computational speed, but will cause run out of memory.

I wonder if there is a better data type supported by PyTorch for above matrix, especially considering its symmetry and “binary”.

Because your matrix is binary, the indices in your COO-format matrix map,
in effect, positions in your input vector to positions in your result vector. So
you can use pytorch indexing facilities to effect the matrix-vector multiplication
(without actually multiplying anything).

Here’s such a scheme (that doesn’t take advantage of the symmetry of the
matrix):

>>> import torch
>>> print (torch.__version__)
1.13.0
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> n = 16
>>> m = torch.randint (2, (n, n)).float() # matrix of zeros and ones (not symmetric)
>>> ms = m.to_sparse() # sparse (coo) version of m
>>> v = torch.randn (16)
>>>
>>> resultA = m @ v
>>> resultB = ms @ v
>>>
>>> torch.allclose (resultB, resultA) # verify that sparse-matrix multiplication works
True
>>>
>>> # matrix-vector "multiplication" that uses sparseness and binary nature of ms
>>> resultC = torch.zeros_like (v)
>>> resultC.index_put_ ((ms.indices()[0],), v[ms.indices()[1]], accumulate = True)
tensor([ -7.7993, -1.3386, -1.8570, 0.1574, -5.2830, -5.6382, -10.0355,
-2.1667, -1.6088, -6.8637, -6.9376, -6.2652, -4.2707, -2.4600,
-3.9741, -1.9541])
>>>
>>> torch.allclose (resultC, resultA) # verify that indexing approach works
True

Logically, this seems to me to be about the best you can do (neglecting
the symmetry of the matrix). However, how well the indexing operations
map to the gpu (or cpu) pipelines will likely be the most important factor
in the actual performance of this scheme.

What do you mean by “diagonal symmetric?” Is it the same as what I would
simply call “symmetric?” Something else?

Unless there is further structure to the block structure that you haven’t
mentioned, I don’t see any way to make use of this. From the numbers
you give, the sparseness of the values within the block matrices is about
the same as the sparseness of the blocks within the big matrix, so the
overall sparseness doesn’t seem to have any real “block” structure to it.

It might be possible to get some savings from the matrix’s symmetry, but
at most that would be a factor of two, which might be lost again by the need
to treat diagonal and off-diagonal elements differently.