I’m trying to replicate the work of Han *et al* (Learning both Weights and Connections for Efficient Neural Networks, 2015), where model compression for the deep CNN models is achieved by pruning close-to-zero weights and then retrain the model. It has two training phases: in the first stage the model is trained as usual, which is used to find weights below a certain threshold; then those insignificant weights are pruned, resulting in a simpler model, and the rest parameters are kept for another fine-tuning training session.

My idea of implementation using PyTorch is that given the trained model from the first stage, I set weights below the threshold to zero (memorized by `pruned_inds_by_layer`

), and then start the second training stage, in which I don’t allow any gradient to be back-propagated to those zero-valued weights. But it seems modifying `p.grad.data`

below doesn’t do the work. Those zero-valued weights still get gradients, making them non-zero again. Any idea how to solve this problem?

```
optimizer.zero_grad()
outputs = cnn(images)
loss = criterion(outputs, labels)
loss.backward()
# zero-out all the gradients corresponding to the pruned connections
for l,p in enumerate(cnn.parameters()):
pruned_inds = pruned_inds_by_layer[l]
p.grad.data[pruned_inds] = 0.
optimizer.step()
```