I am trying to train a pruned neural network which has (say) 36% sparsity which means that 36% of the trainable parameters are 0s. I am using a simple LeNet-300-100 dense network for this:
class LeNet300(nn.Module):
def __init__(self):
super().__init__()
# Define layers-
self.fc1 = nn.Linear(in_features = 28 * 28 * 1, out_features = 300)
self.fc2 = nn.Linear(in_features = 300, out_features = 100)
self.output = nn.Linear(in_features = 100, out_features = 10)
# self.weights_initialization()
def forward(self, x):
out = F.leaky_relu(self.fc1(x))
out = F.leaky_relu(self.fc2(out))
return self.output(out)
After 36% sparsity, the number of trainable parameters = 213288. Originally, without pruning, it has 266610 non-zero parameters.
The layer-wise sparsity looks as:
for param in model.parameters():
print(f"{param.size()} has {torch.count_nonzero(param)} surviving weights")
'''
torch.Size([300, 784]) has 146615 surviving weights
torch.Size([300]) has 300 surviving weights
torch.Size([100, 300]) has 22743 surviving weights
torch.Size([100]) has 100 surviving weights
torch.Size([10, 100]) has 862 surviving weights
torch.Size([10]) has 10 surviving weights
'''
The code I am using to preserve sparsity is:
In .grad (gradient_t
), replace computed gradients with 0s in place positions (within the matrix) where the trainable parameters (wts
) are pruned:
# Compute loss-
J = loss(outputs, labels)
# Empty accumulated gradients-
optimizer.zero_grad()
# Perform backprop-
J.backward()
for name, param in model.named_parameters():
wts = param.data.clone().detach()
gradient_t = param.grad
gradient_t = torch.where(wts == 0., 0., gradient_t)
param.grad = gradient_t
# Update parameters-
optimizer.step()
However, after this, the training basically freezes, and the loss and accuracy stop changing.
?