I have trained several models with amp in FP16 separately and saved all state dicts for both models and amp.
Now as a continuation of that training, I would like to load those models into an ensemble, freeze all gradients and train a new model with a few final layers that learns based on the output from those models and several different inputs.
My current code looks something like this
modelA = ModelA() modelB = ModelB() modelA.load_state_dict(checkpointA['model']) modelB.load_state_dict(checkpointB['model']) for a in modelA.parameters(): a.requires_grad = False for b in modelB.parameters(): b.requires_grad = False ensemble = EnsembleModel(modelA, modelB) optimizer = FusedAdam(filter(lambda p: p.requires_grad, ensemble .parameters()), lr=learning_rate) ensemble , optimizer = amp.initialize(ensemble , optimizer, opt_level) ***perform training***
Now to my questions:
In the checkpoints for modelA and modelB I also saved the amp state dict. Should those be loaded in this case and how?
Does amp.initialize support models embedded inside an ensemble?
What if I want to unfreeze the lower models at a later point, do I need to reinitialize both the optimizer and amp in that case?