Incremental Learning

I have questions about how incremental learning can be done in pyTorch :

Suppose I trained a CNN model, and now would like to add say k more neurons to a layer or every layer while using the pretrained weights. How can I do this?

Would like some response, need to do this for a project today. @ptrblck can you help?

Haha :smile: Its better to see the documentation of transfer learning or fine-tuning
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

I have been over this tutorial many times before and it does not cointain what i am looking for. Lets say i have a hwn filter but only want to fill in hwk of those. How can this be done?

This might be a bit tricky, as you would also have to take care of the following layers, which now take a new number of filter kernels. Also, if your optimizer has internal states, these might be lost.
Anyway, here is a small example:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv = nn.Conv2d(1, 3, 3, 1, 1)
        self.fc = nn.Linear(3*12*12, 10)
        
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv(x)), 2)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        return x


data = torch.randn(10, 1, 24, 24)
target = torch.empty(10, dtype=torch.long).random_(10)

model = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

# Train for a few epochs
for epoch in range(5):
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    print('Epoch {}, loss {}'.format(epoch, loss.item()))


# Add conv layers to model
nb_filters_old = model.conv.weight.size(0)
nb_filters_new = 6

with torch.no_grad():
    old_conv_weights = model.conv.weight
    old_conv_bias = model.conv.bias
    old_fc_weight = model.fc.weight
    
    # Assign new weight and bias (initialize with proper values?)
#    model.conv.weight = nn.Parameter(
#        torch.randn(nb_filters_new, *model.conv.weight.size()[1:]))
#    model.conv.bias = nn.Parameter(torch.randn(nb_filters_new))
#    model.fc.weight = nn.Parameter(
#        torch.randn(old_fc_weight.size(0),
#                    int(model.fc.weight.size(1)/nb_filters_old*nb_filters_new)))
    
    # Alternatively create new layers
    model.conv = nn.Conv2d(old_conv_weights.size(1), nb_filters_new, 3, 1, 1)
    model.fc = nn.Linear(
        int(model.fc.weight.size(1)/nb_filters_old*nb_filters_new),
        old_fc_weight.size(0))
    
    
    # Set pretrained values
    model.conv.weight[:nb_filters_old] = old_conv_weights
    model.conv.bias[:nb_filters_old] = old_conv_bias
    model.fc.weight[:, :old_fc_weight.size(1)] = old_fc_weight


# Create new optimizer (running estimates will be lost)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# Train some more
for epoch in range(5):
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    print('Epoch {}, loss {}'.format(epoch, loss.item()))
2 Likes

Thanks a ton @ptrblck I will try this and report back.

Thanks! @ptrblck this does seem like a feasible solution. Is there a way to partially freeze the layers as well? As in freeze the older learned weights and train only the new portion initialized at random? Like freeze only half the layers?

In that case you should create separate layers, i.e. one conv layer containing the pretrained filters and another one with the random weights, and concatenate the results of these layers after they are applied.
This would make it possible to just optimize the random weights and freeze the old ones.

1 Like

Hi,I probably understand what you mean.For example, there is a convolution kernel of 3*3, I want to fix the weight of the diagonal, and learn the weight of other positions,how can i do this?

As @albanD explained in the other topic you would have to zero out the gradients of your fixed parameters.

Thank you very much,I will try !