Add neurons to an existing layer

I am trying an approach to solve a specific problem that requires me to add n neurons to the last layer of a pre-trained model. The tricky part is that it has to be on the same layer. I found an example in the forums here: older question

I want to keep the pre-trained weights intact and add n randomly initialized ones.

In said question, they gave a code snippet that I modified to my problem. It looks like this:

def add_units(self, n_new):
    '''
    n_new : integer variable counting the neurons you want to add
    '''

    # take a copy of the current weights stored in self._fc which is an 
    # ModuleList variable with only one layer
    current = [ix.weight.data for ix in self._fc]

    # randomly initialize a tensor with the size of the wanted layer 
    hl_input = torch.zeros([n_new, current[0].shape[1]])
    nn.init.xavier_uniform_(hl_input, gain=nn.init.calculate_gain('relu'))

    # concatenate the old weights with the new weights
    new_wi = torch.cat([current[0], hl_input], dim=0)

    # reset weight and grad variables to new size
    self._fc[0] = nn.Linear(current[0].shape[1]+n_new, 2)

    # set the weight data to new values
    self._fc[0].weight.data = torch.tensor(new_wi, requires_grad=True)

This method is inside the model class and can be called by typing “mode.add_units(N)”.
Am I preserving the original waits of the layer “_fc” and adding new, randomly initialize ones or am I missing something?

Running a quick code, adding just two nodes seems to print the right shape:

[In]   print(model._fc)
[Out]  Linear(in_features=2560, out_features=2)
[In]   model.add_units(2)
[In]   print(model._fc)
[Out]  Linear(in_features=2562, out_features=2)

I am not sure if by doing this I am preserving the weights and the bias, I assumed that the last line did so but could not know for certain.

To make sure the old weights are present, you could simply print the old and new weight tensor and compare them or use a proper comparison via old_weight == new_weight[:, :old_weight_num].
I’m not sure how ix etc. is defined, so cannot see, if the creation is correct.

I would recommend to avoid using the .data attribute, as it might have unwanted side effects but instead assign a new nn.Parameter after the manipulation.

Thank you for your answer. Regarding the last part, if I understand correctly, the code should be changed to:

self._fc[0].weight = torch.nn.Parameter(new_wi)

Will this update the parameters of the assigned layer with the values of “new_wi”?

Yes, this should work.

In order to make the problem more comprehensible, I changed from ModuleList to a simple layer, thus the code has the next form:

want to keep the pre-trained weights intact and add n randomly initialized ones.

In said question, they gave a code snippet that I modified to my problem. It looks like this:

def add_units(self, n_new):
    '''
    n_new : integer variable counting the neurons you want to add
    '''

    # take a copy of the current weights stored in self._fc
    current = self._fc.weight.data
    current_bias = self._fc.bias.data #Only used at the end of the post

    # randomly initialize a tensor with the size of the wanted layer 
    hl_input = torch.zeros([n_new, current.shape[1]])
    nn.init.xavier_uniform_(hl_input, gain=nn.init.calculate_gain('relu'))

    # concatenate the old weights with the new weights
    new_wi = torch.cat([current, hl_input], dim=0)

    # reset weight and grad variables to new size
    self._fc = nn.Linear(current.shape[1]+n_new, 2) #2 is the size of my output layer

    # set the weight data to new values
    self._fc.weight = torch.nn.Parameter(new_wi)

My question is, by doing this am I also copying the bias or only the weights? If I try to do a:

current_bias  = self._fc.bias.data

the dimension that I get equals the output dimension, not the input dimension. What changes should I make to correctly add the old biases into this new layer?

self._fc.bias = current_bias #Does not throw an error but doesn't look right.

Diving into the nn.Linear documentation I found this:

'''
Attributes:
        weight: the learnable weights of the module of shape
            :math:`(\text{out\_features}, \text{in\_features})`. The values are
            initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
            :math:`k = \frac{1}{\text{in\_features}}`
        bias:   the learnable bias of the module of shape :math:`(\text{out\_features})`.
                If :attr:`bias` is ``True``, the values are initialized from
                :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
                :math:`k = \frac{1}{\text{in\_features}}`
'''

So it does make sense that the size equals my output size but I’m still left unsure as to whether my method updates the bias correctly or not.

If you are not changing the number of output features, you could just reuse the bias.
Your approach should work, besides the usage of .data which should be avoided. :wink:

So updating the weights will update the bias automatically? And what do you mean by avoiding .data? How would you do that?

Also, do you know why I my weight.grad is None when I do this? Is there something I have to do to initialize them?

No, that’s not the case. Since the bias is defined by the number of out_features it wouldn’t have to be changed if only the in_features differ.

Don’t use the .data attribute, but reassign a new nn.Parameter to the .weight attribute.

Maybe you are detaching the computation graph, so could you post a minimal, executable code snippet showing this issue?

1 Like

This is the function I’ve been trying. I based it mostly off a similar post to this one, but the code used there and here are mostly similar.

Coords tells the function what layer we’re adding neurons to, num_neurons is how many neurons to add, and duplication says whether to leave them as zero or copy the value of other neurons into these neurons.

def add_neurons(self, coords, num_neurons, duplicate=False):
        '''Add new neurons to a layer in the model'''
        # Copy the current weights
        current = [layer.weight.data for layer in self.layers]

        # Make the new weights you'll be adding for the I/O layers
        # Give them values, if we're duplicating
        hl_input = torch.zeros([num_neurons, current[coords].shape[1]])
        if duplicate: hl_input = current[coords][1, (current[coords].shape[1]-num_neurons):current[coords].shape[1]]
        nn.init.xavier_uniform_(hl_input, gain=nn.init.calculate_gain('relu'))

        hl_output = torch.zeros([current[1].shape[0], num_neurons])
        if duplicate: hl_output = current[coords][0, (current[coords+1].shape[1]-num_neurons):current[coords].shape[1]]
        nn.init.xavier_uniform_(hl_input, gain=nn.init.calculate_gain('relu'))

        # Concatenate the old I/O weights with the new I/O weights
        new_wi = torch.cat([current[coords], hl_input], dim=0)
        new_wo = torch.cat([current[coords+1], hl_output], dim=1)

        # Reset weight and grad variables to new size
        self.layers[coords] = nn.Linear(current[coords].shape[1], self.layer_sizes[coords])
        self.layers[coords+1] = nn.Linear(self.layer_sizes[coords], current[coords+1].shape[0])

        # Set the weight data to new values
        self.layers[coords].weight = torch.nn.Parameter(new_wi)
        self.layers[coords+1].weight = torch.nn.Parameter(new_wo)

Your code is unfortunately not executable so I cannot reproduce the issue and debug it and also don’t see any obvious mistakes.

Here’s my entire model class, if that helps.

class QNet(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()

        # 
        # Build the model
        #
        self.layers = nn.ModuleList()
        self.layer_sizes = []

        # Add hidden layers
        self.num_layers = 0
        # Randomized hidden layers
        for i in range(randrange(1, 3)):
            size = randrange(32, 256)
            self.layers.append(nn.Linear(input_size, size))
            input_size = size  # For the next layer
            self.layer_sizes.append(input_size)
            self.num_layers += 1
        
        # Output layer
        self.layers.append(nn.Linear(input_size, output_size))
        self.layer_sizes.append(input_size)
        self.num_layers += 1

        # 
        # Set whether or not to use gpu
        self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        
        #
        # Set optimizer and loss
        self.optimizer = optim.Adam(self.parameters(), lr=LR)
        self.criterion = nn.MSELoss()


    def forward(self, x):
        '''Get output from model.'''
        #for layer in self.layers: x = layer(x)
        for i, layer in enumerate(self.layers): x = layer(x) if i==self.num_layers else layer(F.relu(x))
        return x


    def add_neurons(self, coords, num_neurons, duplicate=False):
        '''Add new neurons to a layer in the model'''
        # Copy the current weights
        current = [layer.weight.data for layer in self.layers]

        # Make the new weights you'll be adding for the I/O layers
        # Give them values, if we're duplicating
        hl_input = torch.zeros([num_neurons, current[coords].shape[1]])
        if duplicate: hl_input = current[coords][1, (current[coords].shape[1]-num_neurons):current[coords].shape[1]]
        nn.init.xavier_uniform_(hl_input, gain=nn.init.calculate_gain('relu'))

        hl_output = torch.zeros([current[1].shape[0], num_neurons])
        if duplicate: hl_output = current[coords][0, (current[coords+1].shape[1]-num_neurons):current[coords].shape[1]]
        nn.init.xavier_uniform_(hl_input, gain=nn.init.calculate_gain('relu'))

        # Concatenate the old I/O weights with the new I/O weights
        new_wi = torch.cat([current[coords], hl_input], dim=0)
        new_wo = torch.cat([current[coords+1], hl_output], dim=1)

        # Reset weight and grad variables to new size
        self.layers[coords] = nn.Linear(current[coords].shape[1], self.layer_sizes[coords])
        self.layers[coords+1] = nn.Linear(self.layer_sizes[coords], current[coords+1].shape[0])

        # Set the weight data to new values
        self.layers[coords].weight = torch.nn.Parameter(new_wi)
        self.layers[coords+1].weight = torch.nn.Parameter(new_wo)

One workaround to this may be to add a new head to your network since you just want to add to the last layer. The advantage of this vs the above approach would be that optimizer state (e.g. Adam momentum/decay params) will be preserved. Just make sure to calculate the loss for your new head.