ValueError: loaded state dict has a different number of parameter groups

Hi,

I trained a model and than continued to train the same model with an additional layer, where I needed to add parameters to the optimizer with this line

scheduled_optim._optimizer.add_param_group({‘params’: model.speaker_encoder.stl.parameters()})

Note here that the speaker_encoder.stl is the new added layer.

than I added a new loss to the existing model and wanted to continue training this model with loaded optimizer. But it throws an error
ValueError: loaded state dict has a different number of parameter groups

What can you suggest?

I don’t know which code raises the error exactly, as it seems you might be trying to load an optimizer state_dict after manipulating the parameters?
If so, could you try to restore the optimizer before adding new parameter groups?

Lets imagine a 3 stage training.

  1. I am training some model and save at 100k step to continue.
  2. I need to add additional parameters to the previous model and train more 100k steps.
    For that I am adding parameters of the model after loading the model and optimizer. Then, I add model.speaker_encoder.stl this layer to the model and scheduled_optim._optimizer.add_param_group({‘params’: model.speaker_encoder.stl.parameters()})
    these params to the optimizer. It worked fine. I saved the model and optimizer at step 200k.
  3. Now I want to load model at 200k step and use additional loss for the training. When I am doing this it says.
    ValueError: loaded state dict has a different number of parameter groups

I reproduced it with a small example. Imagine I have a network which I train a few epochs and save the state of model and optimizer.

class Net(nn.Module):
def init(self):
super().init()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
#self.fc4 = nn.Linear(10, 10)

def forward(self, x):
    x = self.pool(F.relu(self.conv1(x)))
    x = self.pool(F.relu(self.conv2(x)))
    x = torch.flatten(x, 1) # flatten all dimensions except batch
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    #x = self.fc4(x)
    return x

net = Net()

Then I load the model and optimizer and also add lines
net.fc4 = nn.Linear(10, 10)
optimizer.add_param_group(
{‘params’: [net.fc4.weight]})

Then I train the this model a few steps and save the state of model and optimizer.
After that I want to continue the model training.
So I am loading the new model with fc4 linear layer already in the model. The model loaded succesfully.
But the optimizer throws an error

[ValueError: loaded state dict has a different number of parameter groups]