Incremental Learning

bluesky314 · February 13, 2019, 4:47am

I have questions about how incremental learning can be done in pyTorch :

Suppose I trained a CNN model, and now would like to add say k more neurons to a layer or every layer while using the pretrained weights. How can I do this?

bluesky314 · February 14, 2019, 9:30pm

Would like some response, need to do this for a project today. @ptrblck can you help?

Muhammad_Furqan_Rafi · February 14, 2019, 9:42pm

Haha Its better to see the documentation of transfer learning or fine-tuning
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

bluesky314 · February 15, 2019, 7:20am

I have been over this tutorial many times before and it does not cointain what i am looking for. Lets say i have a hwn filter but only want to fill in hwk of those. How can this be done?

ptrblck · February 15, 2019, 4:58pm

This might be a bit tricky, as you would also have to take care of the following layers, which now take a new number of filter kernels. Also, if your optimizer has internal states, these might be lost.
Anyway, here is a small example:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv = nn.Conv2d(1, 3, 3, 1, 1)
        self.fc = nn.Linear(3*12*12, 10)
        
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv(x)), 2)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x


data = torch.randn(10, 1, 24, 24)
target = torch.empty(10, dtype=torch.long).random_(10)

model = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

# Train for a few epochs
for epoch in range(5):
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    print('Epoch {}, loss {}'.format(epoch, loss.item()))


# Add conv layers to model
nb_filters_old = model.conv.weight.size(0)
nb_filters_new = 6

with torch.no_grad():
    old_conv_weights = model.conv.weight
    old_conv_bias = model.conv.bias
    old_fc_weight = model.fc.weight
    
    # Assign new weight and bias (initialize with proper values?)
#    model.conv.weight = nn.Parameter(
#        torch.randn(nb_filters_new, *model.conv.weight.size()[1:]))
#    model.conv.bias = nn.Parameter(torch.randn(nb_filters_new))
#    model.fc.weight = nn.Parameter(
#        torch.randn(old_fc_weight.size(0),
#                    int(model.fc.weight.size(1)/nb_filters_old*nb_filters_new)))
    
    # Alternatively create new layers
    model.conv = nn.Conv2d(old_conv_weights.size(1), nb_filters_new, 3, 1, 1)
    model.fc = nn.Linear(
        int(model.fc.weight.size(1)/nb_filters_old*nb_filters_new),
        old_fc_weight.size(0))
    
    
    # Set pretrained values
    model.conv.weight[:nb_filters_old] = old_conv_weights
    model.conv.bias[:nb_filters_old] = old_conv_bias
    model.fc.weight[:, :old_fc_weight.size(1)] = old_fc_weight


# Create new optimizer (running estimates will be lost)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# Train some more
for epoch in range(5):
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    print('Epoch {}, loss {}'.format(epoch, loss.item()))

bluesky314 · February 15, 2019, 5:50pm

Thanks a ton @ptrblck I will try this and report back.

bluesky314 · February 26, 2019, 5:26am

Thanks! @ptrblck this does seem like a feasible solution. Is there a way to partially freeze the layers as well? As in freeze the older learned weights and train only the new portion initialized at random? Like freeze only half the layers?

ptrblck · February 26, 2019, 1:09pm

In that case you should create separate layers, i.e. one conv layer containing the pretrained filters and another one with the random weights, and concatenate the results of these layers after they are applied.
This would make it possible to just optimize the random weights and freeze the old ones.

zenghang · March 17, 2019, 2:06am

Hi，I probably understand what you mean.For example, there is a convolution kernel of 3*3, I want to fix the weight of the diagonal, and learn the weight of other positions，how can i do this?

ptrblck · March 17, 2019, 1:06pm

As @albanD explained in the other topic you would have to zero out the gradients of your fixed parameters.

zenghang · March 18, 2019, 11:51am

Thank you very much,I will try !