I’m trying to prune neurons from a feed forward layer in Pytorch while training.
Thus after an epoch, I remove row from the weight matrix of layer 2, and I remove a column and bias from the matrix of layer 1.
Something along the lines of:
self.W = Parameter(self.W[inverse_mask, :].data)
This operation works, and calculation happens as planned.
However, now self.W does not get updated anymore? Does anyone have an idea how I can ‘reregister’ the parameter, or change the data of the parameter on runtime in such a way that it works?
- I tried to change the data itself, but that runs into gradient problems from which the sizes are stored.
- I tried to register the parameter in init as self.register_parameter(‘W’, None), but that didn’t help.
Small update. I figured out that the gradient problem is mainly because of Adam storing momentum parameters that should also decrease in size. However, hacking around with that seems suboptimal. Any suggestions?
you will have to reconstruct the optimizer if you change the shapes of the parameters. there’s no real workaround.
Alternatively, you can simply slice self.W in the
forward function and keep it’s original shape intact.
out = torch.mm(input, self.W[inverse_mask, :]) # just an example
Here you dont repackage or change self.W, you just enforce a mask at runtime.
Is there any reason not to just reinitialize the optimizer?
(except for the loss of momentum)
Assume we use an optimizer like SGD, which has no shape-dependent parameters, then it should be no problem, right?