Training progressive growing network

alwynmathew · December 4, 2020, 12:04pm

I’m trying to dynamically keep adding new modules to my network after N steps as in Progressive Growing of GAN.

But I came across this info from @Carl about PyTorch from here:

Dynamically adding/removing modules is relatively easy in Tensorflow/Keras, since a graph of the model is available on the python side. In PyTorch, you cannot traverse the graph of your model to insert modules.

@Carl also mention that we can make:

layers act like the identity function

and train the network.

Questions:

How to dynamically add modules to a pre-exsisting model?
How to update optimizer (state_dict) without losing old info?
How can I use initially nn.Identity in my model and later replace it with Conv2d after N steps?

Any help is appreciated. Thank you.

alwynmathew · December 5, 2020, 2:47pm

Dynamically adding new params to optimizer seems to be harder that I though in pytorch. Even though, I found a solution to do it, here. This post author itself doesn’t recommend using it.

However, here’s the reason why you should probably never update your optimizer like this, but should instead re-initialize from scratch, and just accept the loss of state information

So - to sum it up: I’d really recommend to try to keep it simple, and to only change a parameter as conservatively as possible, and not to touch the optimizer.

I tried re-initialize optimizer from scratch whenever a new module is added to the network and I saw huge drop in validation which I cant sacrifice. Is there a better way to solve this problem?

cc: @ptrblck

ptrblck · December 5, 2020, 8:46pm

You can use model.add_module, but would also need to change the forward method or use this new module manually.

optimizer.add_param_group can be used to add new parameters.

This might work, but I would recommend to double check, if this layer is really used (e.g. via forward hooks) and make sure the parameters are added to the optimizer.

alwynmathew · December 6, 2020, 2:10am

Thank you. This is how I want to add new blocks to the network dynamically. Can I use add_module as shown below?

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.modules = [convblock1, convblock2, convblock3]  # some list of modules
        self.convblock0 = self.conv(...)
        self.add_module("convblock1", self.modules[0]) # this doesn't look like the recommend way to use it 
        self.add_module("convblock2", self.modules[1])
        self.add_module("convblock3", self.modules[2])

    def forward(self,x, epoch_num):
        x = self.convblock0(x)
        if epoch_num >= 2:
            x = self.convblock1(x)
        if epoch_num >= 4:
            x = self.convblock2(x)
        if epoch_num >= 6:
            x = self.convblock3(x)
        return x

OR

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.modules = [convblock1, convblock2, convblock3]  # some list of modules
        self.convblock0 = self.conv(...)

    def forward(self,x, epoch_num):
        x = self.convblock0(x)
        if epoch_num == 2: # can I use add_module inside forward?
            self.add_module("convblock1", self.modules[0])
        if epoch_num >= 2:      
            x = self.convblock1(x)
        if epoch_num == 4:
            self.add_module("convblock2", self.modules[1])
        if epoch_num >= 4:
            x = self.convblock2(x)
        if epoch_num == 6:
            self.add_module("convblock3", self.modules[2])
        if epoch_num >= 6:
            x = self.convblock3(x)
        return x

Is this how I should update the optimizer after added new module to the model?

model = MyModel()
optimizer = torch.optim.Adam(model.convblock0.parameters(), learning_rate)
for epoch in range(10):
    if epoch == 2:
        new_par_dict = dict()
        new_par = model.modules[0].parameters()
        new_par_dict["params"] = new_par
        optimizer.add_param_group(new_par_dict)
    if epoch == 4:
        new_par_dict = dict()
        new_par = model.modules[1].parameters()
        new_par_dict["params"] = new_par
        optimizer.add_param_group(new_par_dict)
    if epoch == 6:
        new_par_dict = dict()
        new_par = model.modules[2].parameters()
        new_par_dict["params"] = new_par
        optimizer.add_param_group(new_par_dict)
    output = model(input, epoch)
    loss = loss_fun(output)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

But add_param_group expect dict. Is there a faster way to get param group dict of a specific layer? And also does this preserve previous state of the optimizer?

cc: @ptrblck

alwynmathew · December 14, 2020, 4:57am

Why can’t I just append my new parameter tensor list to the current optimizer’s state_dict?

model = MyModel()
optimizer = torch.optim.Adam(model.convblock0.parameters(), learning_rate)
state_dict = optimizer.state_dict()
for epoch in range(10):
    if epoch == 2:
        new_par = model.modules[0].parameters()
        state_dict['param_groups'][0]['params'].append(new_params)
...

Does this preserve previous state of the optimizer? Any help is appreciated. Thank you.

cc: @albanD @smth