Slow sparse matrix multiplication

I am implementing a sparse random projection, using a random sparse matrix (Li, Ping, Trevor J. Hastie, and Kenneth W. Church. “Very sparse random projections.”).

This operation should in principle be much faster than a dense random projection, but experimentally I get the converse. In my example, I am multiplying a 5k vector by a 10k*5k matrix. When using a dense matrix, doing, vec) executes in around 500 microseconds, while using a sparse matrix,, vec) executes in 40 ms.
If I further increase the size, the dense multiplication is still much faster than the sparse one.

Does somebody have some insights in why this happens?


Have you made progress on this issue?