Loading saved Amp models into ensemble

Eiphodos · September 24, 2020, 7:27am

I have trained several models with amp in FP16 separately and saved all state dicts for both models and amp.

Now as a continuation of that training, I would like to load those models into an ensemble, freeze all gradients and train a new model with a few final layers that learns based on the output from those models and several different inputs.

My current code looks something like this

modelA = ModelA()
modelB = ModelB()

modelA.load_state_dict(checkpointA['model'])
modelB.load_state_dict(checkpointB['model'])

for a in modelA.parameters():
    a.requires_grad = False
for b in modelB.parameters():
    b.requires_grad = False

ensemble = EnsembleModel(modelA, modelB)
optimizer = FusedAdam(filter(lambda p: p.requires_grad, ensemble .parameters()), lr=learning_rate)

ensemble , optimizer = amp.initialize(ensemble , optimizer, opt_level)

***perform training***

Now to my questions:

In the checkpoints for modelA and modelB I also saved the amp state dict. Should those be loaded in this case and how?
Does amp.initialize support models embedded inside an ensemble?
What if I want to unfreeze the lower models at a later point, do I need to reinitialize both the optimizer and amp in that case?

ptrblck · September 24, 2020, 7:48am

This wouldn’t be necessary, if you are not planning to finetune the pretrained models.

We recommend to use the native amp implementation via torch.cuda.amp instead of apex/amp. Since there is no amp.initialize method, it should just work, but let us know if you encounter any issues.

The amp state_dict stores the loss scaler values etc. which is useful to continue the training with the same setup. If you initialize a new scaler, the first iterations might be skipped, if the scaling factor is too high, which wouldn’t necessarily be the case using the scaling factor from the last iteration in the pretraining.

Eiphodos · September 24, 2020, 7:53am

I see, thanks for the help!