How to use pretrained weights to train a extended network architecture

Hi I am designing a architecture that contains two modules (GCN+RNN) in the class definition. I will train it for around 100 epochs. I know I can save the model using torch.save and use a pretrained model using torch.load. But how can I use the weights from the pretrained model to initialize the same of modules of an extended network. For example, say my new network contains three modules (GCN+RNN+CNN) with the final fc layer being the last layer.

I have used the following codes to update part of the models states and it works

data = torch.load(PATH)
pretrained_states = data[‘model_state_dict’]
model_dict = model.state_dict()
pretrained_states = {k: v for k, v in pretrained_states.items() if k in model_dict}

model_dict.update(pretrained_states)
model.load_state_dict(model_dict)

but when I am trying to load the optmizer state_dict() it gives an error
“loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group”

I am trying to update optimizer state the same way but its not working.