I’m trying to replicate the work of Han et al (Learning both Weights and Connections for Efficient Neural Networks, 2015), where model compression for the deep CNN models is achieved by pruning close-to-zero weights and then retrain the model. It has two training phases: in the first stage the model is trained as usual, which is used to find weights below a certain threshold; then those insignificant weights are pruned, resulting in a simpler model, and the rest parameters are kept for another fine-tuning training session.
My idea of implementation using PyTorch is that given the trained model from the first stage, I set weights below the threshold to zero (memorized by
pruned_inds_by_layer), and then start the second training stage, in which I don’t allow any gradient to be back-propagated to those zero-valued weights. But it seems modifying
p.grad.data below doesn’t do the work. Those zero-valued weights still get gradients, making them non-zero again. Any idea how to solve this problem?
optimizer.zero_grad() outputs = cnn(images) loss = criterion(outputs, labels) loss.backward() # zero-out all the gradients corresponding to the pruned connections for l,p in enumerate(cnn.parameters()): pruned_inds = pruned_inds_by_layer[l] p.grad.data[pruned_inds] = 0. optimizer.step()