I’m trying to prune neurons from a feed forward layer in Pytorch while training.
Thus after an epoch, I remove row from the weight matrix of layer 2, and I remove a column and bias from the matrix of layer 1.
Something along the lines of:
self.W = Parameter(self.W[inverse_mask, :].data)
This operation works, and calculation happens as planned.
However, now self.W does not get updated anymore? Does anyone have an idea how I can ‘reregister’ the parameter, or change the data of the parameter on runtime in such a way that it works?
I tried to change the data itself, but that runs into gradient problems from which the sizes are stored.
I tried to register the parameter in init as self.register_parameter(‘W’, None), but that didn’t help.
Small update. I figured out that the gradient problem is mainly because of Adam storing momentum parameters that should also decrease in size. However, hacking around with that seems suboptimal. Any suggestions?