Sparse Feed Forward NN

I’m trying to implement a 1-layer sparse network.

The model should receive a high dimensionality (=N) input vector (sparse, 1% or lower), process it with a custom layer and map it into an output of same dimension.

The problem is that a matrix of size N^2 won’t fit in any GPU and anyway I want to set most of the weights to 0.
I’m completely new to pytorch, so before trying to implement this I would like to know if there is a fair support for this case (sparse input x sparse matrix --> sparse output) and computing SGD on the sparse weights matrix, possibly using CUDA or at least multiple CPUs (I have a working code in Theano, but sparse matrices don’t support any parallelism)

Thanks a lot!