# Equivalence of two approaches for sparsity

Hi Everyone,
I am trying to understand whether the following two ways to achieve a sparse matrix are equivalent in Pytorch. Let me add some context:

I am training Sparse Neural Networks with a specific structured sparsity pattern. To do so, there are two methods that I have looked at.

1. Parameterized Approach

In this approach P_j is a permutation matrix having 1s in places where the weights are supposed to be non-zero in the final W matrix. P is pre-defined and is not trainable. Moreover, there is no overlap between two P matrices. Also, each P has 1s in D positions.
alpha is a vector of size J and can have values as either 1 or 0. It dictates which permutation matrices to choose for the final W matrix.

This approach also uses P matrices to decide the non-zeros’ position in the final W matrix.
M is the final mask after adding K permutation matrices.

Comparison
To compare the two methods, we ensure that the resulting W has the same number of non-zeros. Let’s take the example of K = 30.

1. In the first approach, we randomly pick 30 alphas to be 1, and the rest all are zero. This will give us a W matrix with 30 x D non-zeros.
2. For the second approach, we use the indices of randomly picked alphas and pick the corresponding P matrices to form M
3. I also set the random seed to be the same for both experiments to initialize W.

I am using these weight formulations for an MLP on CIFAR-10. My MLP has the following specifications:

1. Layer1: 3072x3072 Weight Matrix
2. Layer2: 10x3072 Weight Matrix

In both experiments, I am sparsifying Layer1 and keeping Layer2 dense.

The dense accuracy of the network is 57.34 % (Not great, but this network is just a toy example for our sparsity research)

When I choose, K =30 and do the experiments, I get an accuracy of around 50.1% with parameterized approach and 56.4% with masking approach.

Does anyone have any suggestions on why that might be the case?
Are the two approaches equivalent in terms of Pytorch implementation of gradient flow?
Any suggestions on how I can debug this?

@ptrblck I will appreciate any input on this. Thank you!