Modifying/Custom Optimizer: Updating Layer Parameters

J_Johnson · January 28, 2021, 8:44am

I’m working on a modification to a pytorch optimizer file. For simplicity sake, let’s take SGD(sgd.py). If I wanted to change the “for loop” on line 94 to:

for i, p in enumerate(group['params']:

And add something starting on line 111 in the for loop like:

if i==0:
  bias=torch.rand(10)
  group['params'][i+1]=bias

Also, suppose that the number of biases are 8 in this layer before going through this ‘optimizer’. Currently, it doesn’t update the parameters when I run this. What would I need to change in order to make the tensor size change?

P.S. The example above is not the full code but is just for example.

Update: So the issue isn’t the selector. It just isn’t passing the new parameters through regardless.

J_Johnson · January 29, 2021, 11:50am

My main goal, here, is just to be able to update the size and values of the parameters of a layer dynamically between epochs. How that is accomplished, per se, I’m willing to try to make work. If that means writing an optimizer, or if there is a simpler way. Just point me in the right direction. Everything I have tried thus far works on tensors but does not work on iterators. Where are the parameters and layer shapes stored and how can I access and change these?

ptrblck · January 30, 2021, 10:25am

The optimizers use references to the parameters. If you are changing these parameters, I would recreate the optimizer instead of trying to manipulate the references somehow.
Note that running estimates would be reset using this workflow.

J_Johnson · January 30, 2021, 11:13am

Can you provide a simple example?

Say we have:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1=nn.Linear(784,10)
        self.fc2=nn.Linear(10,10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

#instantiate the model
net=Net()

crit=nn.CrossEntropyLoss()
opt=optim.SGD(net.parameters(), lr=0.001)

inputs, labels =torch.rand(4,784), torch.tensor([1,2,3,4])

opt.zero_grad()
outputs = net(inputs)
loss= crit(outputs, labels)
loss.backward()

#New parameters for change 10 ---->15 neuron hidden layer
new_params=torch.rand(15,784),torch.rand(15),torch.rand(15,10),torch.rand(10)

#Some additional code here to insert new_params to replace net's layers(both in size and values)

# end code and continue to train new model

opt.zero_grad()
outputs = net(inputs)
loss= crit(outputs, labels)
loss.backward()
...

How can I replace the old parameters and layer sizes with new parameters and layers?

J_Johnson · January 31, 2021, 7:42am

Closest “solution”(not) I’ve found so far is to use copy_:

with torch.no_grad():
    for i, p in enumerate(net.parameters():
        p.copy_(new_params[i])

But this gives back a RuntimeError that size of the tensor needs to match. Seems like I’m getting close to cracking this egg. Just need to know the right command to overwrite the layer sizes.

J_Johnson · January 31, 2021, 10:00am

I got it working! With the following code:

with torch.no_grad():
    setattr(net, 'fc1', nn.Linear(784, 15))
    setattr(net, 'fc2', nn.Linear(15, 10))
    for i, p in enumerate(net.parameters()):
        p.copy_(new_params[i])

The next question is, how can I target the hyper-parameters of the layers in a loop?

for name, module in net.named_modules():
     print(name) ->>> fc1 
     print(type(module)) ->>> Linear
     print(???(module)) ->>> (784, 15)

**Edit: I found how to get the type() from the module.py file.

J_Johnson · February 1, 2021, 6:23am

Found a full solution, albeit messy and it’s not generalized for various layer types. It would have to be rewritten for, say, Conv2d, etc.

def update_features(model, attr, targ, ins, outs):
    setattr(model, attr, type(targ)(ins, outs))

def replace_network(model, new_params):
    c=0
    for attr in dir(model):
        targ=getattr(model,attr)
        if type(targ)==nn.Linear:
            print(new_params[c].size()[1],new_params[c].size()[0])
            update_features(model,attr,targ,new_params[c].size()[1],new_params[c].size()[0])
            c=c+2
    with torch.no_grad():
        for i, p in enumerate(model.parameters()):
            p.copy_(new_params[i])

replace_network(net,new_params)

Thanks for your input @ptrblck. I certainly don’t envy your job! And sorry for all the questions; just started learning Python 2 months ago and “torch”-ing about a week ago.

Cheers

ptrblck · February 1, 2021, 6:27am

Good to hear it’s working now and you’ve figured it out!

Haha, you might call it a hobby or work, but I certainly enjoy it a lot!

J_Johnson · February 4, 2021, 6:35am

Hello @ptrblck ,

Thought you might be curious about what I was working on. I have implemented an algorithm that dynamically splits neurons between training epochs. It works and the results look very promising. On my research, the fixed time results have shown around an 11% increased accuracy and 18% lower loss on CIFAR-10 over a control group starting with the same number of parameters. I’ve uploaded the python files and experiment results to Github here: GitHub - CerebralSeed/Neural-Splitting-with-Dice-Roll-in-Pytorch: This demonstrates a dynamic neural splitting module implemented in Pytorch and combined with a "best of n" starting parameters.

I’m working on unlimited time experiments next, and so far they also look very promising with time to best validation accuracy cut in nearly half and achieving 2-3% higher accuracy. Will hopefully post that next week.

Cheers.

ptrblck · February 4, 2021, 6:37am

Cool, that sounds interesting so thanks for sharing! Looking forward for the next results. Are you planning on writing a paper about your experiments once they are done?

J_Johnson · February 4, 2021, 6:42am

Thanks for your help so far.

I’m not really an academic type. Just had an idea and went with it. I did write a paper and uploaded that to the Github in the Research → Limited Time Trial directory. It also has the results of 10 the test and control trials. The python files that produced the results are all there. One script for the test and one for control. So it should be pretty easy to reproduce the same results.

J_Johnson · February 9, 2021, 9:34am

Thought you might want to know that the second set of tests are done and posted on Github. A third set of trials with smaller model size is in the works. I’ve also added some information in the main readme file explaining what I’m doing and the motivation behind it(this is not just arbitrarily copying parameter values) Also uploaded a very simple spreadsheet example proving how the splitting of neurons through this method keeps the outputs unchanged through mat mul. This method only provides the model additional neurons to tweak, targetting only the neurons having the highest average loss between epochs.

Now I am trying to automate passing the optimizer references between epochs, without having to recreate the optimizer. I’ve updated the optimizer.state tensors. But it seems there is more to it than that. Just like when I updated the model attributes via,

def update_features(model, attr, targ, ins, outs):
    setattr(model, attr, type(targ)(ins, outs))

def replace_network(model, new_params):
    c=0
    for attr in dir(model):
        targ=getattr(model,attr)
        if type(targ)==nn.Linear:
            print(new_params[c].size()[1],new_params[c].size()[0])
            update_features(model,attr,targ,new_params[c].size()[1],new_params[c].size()[0])
            c=c+2
    with torch.no_grad():
        for i, p in enumerate(model.parameters()):
            p.copy_(new_params[i])

replace_network(net,new_params)

Is there a similar method for updating the optimizer’s group_params?

ptrblck · February 9, 2021, 10:22am

Thanks for the update!
If you are assigning new parameters to the model and want to add them to the optimizer, you could use optimizer.add_param_group and pass these new params to it.