How to add new nodes at the last layer (fully connected layer)?

Hi everyone,

I used a pre-trained ResNet50 as my base model and I have 50 classes at the last layer (fully connected layer). The training was already performed and worked well.

However, I want to add 5 new nodes (=adding new tasks) into the last layer. Is it possible to add the new nodes to the last layer without creating a new layer as I want to keep the existing nodes which have been trained well?

Anyone can help me? Many thanks, everyone

1 Like

Hello Nehemia -

I am hardly an expert, so take this as a not-very-well-informed
suggestion:

It would seem reasonable to me to simply add your new output
nodes with zero or small (or random) values for their weights.

In more detail, let’s assume (just to have some numbers) that
your next to last layer has 100 outputs. Then your last layer
(with 50 outputs) – being fully connected, and assuming bias
weights – will have 5050 weights.

Make a new model that is the same except that its last layer
has 55 outputs. This new last layer still has 100 inputs, so it
will now have 5555 weights (including biases). Set the weights
that connect to your original 50 outputs to their pre-trained
values, but initialize the weights to the 5 new outputs with
some sort of “random” values.

Experts should be able to give you best practices for initializing
the new weights, but I would probably try something like random
values of maybe 20% of the size of the typical pre-trained weights.

Initially your five new task prediction values will come out random,
but should get better as you further train your model. And the
“feature extraction” encoded in the earlier layers should still
work well.

Best regards.

K. Frank.

1 Like

Hi Frank,

Thanks for your explanation. Yes, that is what I am about to do on PyTorch. However, I don’t know how to add the new nodes and link the weights and biases and randomize the initial weight in codes. Could you help me give some examples in scripts or URL where I can learn it? Thanks in advance.

Hello,

Because the weight shape of new_last_layer wil mismatch the weight shape of old_last_layer, I think you should do some manaully, padding the old_last_layer weight to the same shape of new_last_layer weight with random values, zeros or ones, and then load_state_dict normally.
I wrote a simple demo for you

class LinearNet(nn.Module):
    def __init__(self):
        super(LinearNet, self).__init__()

        self.linear = nn.Linear(5,5,bias=False)
        self.last_linear_layer = nn.Linear(5, 10, bias=False)

    def forward(self, x):

        return self.last_linear_layer(self.linear(x))

class New_LinearNet(nn.Module):
    def __init__(self):
        super(New_LinearNet, self).__init__()

        self.linear = nn.Linear(5,5,bias=False)
        self.last_linear_layer = nn.Linear(5, 15, bias=False)
        
    def forward(self, x):

        return self.last_linear_layer(self.linear(x))

old_model = LinearNet()
random_input = torch.randn(5, 5)
random_target = torch.randn(10,)
criterion = nn.MSELoss()
opt = torch.optim.SGD(old_model.parameters(), lr=0.001)
for i in range(5):
    opt.zero_grad()
    output = old_model(random_input)
    loss = criterion(output, random_target)
    loss.backward()
    opt.step()

torch.save(old_model.state_dict(), 'old_model.pth')
ckpt = torch.load('old_model.pth')
new_part = torch.randn(5,5) # new part of weight matrix
ckpt['last_linear_layer.weight'] = torch.cat([ckpt['last_linear_layer.weight'], new_part], dim=0)

new_model = New_LinearNet()
new_model.load_state_dict(ckpt)
opt_new = torch.optim.SGD(new_model.parameters(), lr=0.001)
random_target_new = torch.randn(15,)
for i in range(5):
    opt_new.zero_grad()
    output = new_model(random_input)
    loss = criterion(output, random_target_new)
    loss.backward()
    opt_new.step()

I am not sure if this snippet meet your needs, so it works please let me know~

1 Like

Thank you so much for your clear explanation. It answered my question @MariosOreo

Thanks for your clear explanation @K.Frank! !