Updating Adam optimizer after changing network structure during training

PeaBrane · May 2, 2022, 6:14pm

I have a use case where I am required to change the network architecture in the middle of training, while using the Adam optimizer.

For example, let’s say I am training a network first for CIFAR-10, where the last layer is an self.fc = nn.Linear(1024, 10). And after the test accuracy reaches a certain threshold (say 70%), I switch to the CIFAR-100 dataset, and I replace the last layer with self.fc = nn.Linear(1024, 100).

My question is, if I am using the Adam optimizer where the running mean of the parameter gradients are kept, what is the best way for me to update the optimizer such that the running mean for the previously layers are kept.

In other words, if I simply re-run optim.Adam(model.parameters()), will the running mean of all layers be reset as well?