Dynamically changing parameter sizes

Tijmen_Blankevoort · October 9, 2017, 9:51pm

Hey all!

I’m trying to prune neurons from a feed forward layer in Pytorch while training.
Thus after an epoch, I remove row from the weight matrix of layer 2, and I remove a column and bias from the matrix of layer 1.

Something along the lines of:
self.W = Parameter(self.W[inverse_mask, :].data)

This operation works, and calculation happens as planned.

However, now self.W does not get updated anymore? Does anyone have an idea how I can ‘reregister’ the parameter, or change the data of the parameter on runtime in such a way that it works?

I tried to change the data itself, but that runs into gradient problems from which the sizes are stored.
I tried to register the parameter in init as self.register_parameter(‘W’, None), but that didn’t help.

Cheers,
Tijmen

Tijmen_Blankevoort · October 10, 2017, 11:03am

Small update. I figured out that the gradient problem is mainly because of Adam storing momentum parameters that should also decrease in size. However, hacking around with that seems suboptimal. Any suggestions?

smth · October 11, 2017, 4:42am

you will have to reconstruct the optimizer if you change the shapes of the parameters. there’s no real workaround.

Alternatively, you can simply slice self.W in the forward function and keep it’s original shape intact.

Like:

out = torch.mm(input, self.W[inverse_mask, :]) # just an example

Here you dont repackage or change self.W, you just enforce a mask at runtime.

roee · March 19, 2018, 1:57pm

Is there any reason not to just reinitialize the optimizer?
(except for the loss of momentum)

dipfit · April 2, 2021, 8:14am

Assume we use an optimizer like SGD, which has no shape-dependent parameters, then it should be no problem, right?