AMP initialization with fp16

JIALI_MA · February 16, 2021, 10:30am

I’d like to know how should I initialize the model if the model is separated into several modules.
For example:

model = def_model()  # backbone layers
model_loss = def_loss()  # FC classifier 
params =  list(model.parameters()) + list(model_loss.parameters())  # all the parameters
optimizer = torch.optim.SGD(params, lr)

Then if I want to train the model using apex fp16, which operation is correct?

Init all the sub-modules

[model, model_loss], optimizer = amp.initialize([model, model_loss], optimizer, opt_level="O1")

Only init the main module

model, optimizer = amp.initialize(model, optimizer, opt_level="O1")

Since the params already contains all the parameter, it is able to train using both methods. But what will happen if I only include partial modules in the init operation?

ptrblck · February 16, 2021, 6:45pm

We recommend to switch to the native mixed-precision training utility used via torch.cuda.amp as described in this doc.

JIALI_MA · February 17, 2021, 4:00am

Thanks for the reply.
Actually we switched from torch.cuda.amp to apex due to some internal reasons.
I studied the doc for apex, but found no instruction on this aspect. Could you please provide some suggestions regarding the above issue? Thanks!

ptrblck · February 17, 2021, 6:11am

The passed modules to amp.initialize will get patched based on the used opt_level. If you are not passing all modules to it, autocasting might not work and/or the code might break in another unexpected way.
Note that apex.amp won’t get any new features anymore, as the current development is focused on the native implementation.