I have a use case where I am required to change the network architecture in the middle of training, while using the Adam optimizer.
For example, let’s say I am training a network first for CIFAR-10, where the last layer is an
self.fc = nn.Linear(1024, 10). And after the test accuracy reaches a certain threshold (say 70%), I switch to the CIFAR-100 dataset, and I replace the last layer with
self.fc = nn.Linear(1024, 100).
My question is, if I am using the Adam optimizer where the running mean of the parameter gradients are kept, what is the best way for me to update the optimizer such that the running mean for the previously layers are kept.
In other words, if I simply re-run
optim.Adam(model.parameters()), will the running mean of all layers be reset as well?