I am trying to train a model with SGD(lr = 0.001, momentum = 0.9) and save it after training in order to then adapt it to a subset of the training speakers by freezing all the weights and only training the ones in the last layer. How should I manage the loading of the optimizer and changing its parameters to only the last layer? I know how to freeze the layers but then the optimizer size doesn’t match the loaded one:

```
# Load the model and optimizer state_dicts
checkpoint = torch.load(modelPath,map_location=device)
# Apply pretrained model weights
model.load_state_dict(checkpoint['model_state_dict'])
# Freeze the weights of the first layers
for param in model.parameters():
param.requires_grad = False
# Enable weight updates in the last layer
for param in model.output_layer.parameters():
param.requires_grad = True
# Define the optimizer
optimizer = optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), learning_rate, momentum=0.9)
optimizer.load_state_dict(checkpoint['optimizer_state_dict']) #doesn't match de size of the new optimizer
```