AMP initialization with fp16

I’d like to know how should I initialize the model if the model is separated into several modules.
For example:

model = def_model()  # backbone layers
model_loss = def_loss()  # FC classifier 
params =  list(model.parameters()) + list(model_loss.parameters())  # all the parameters
optimizer = torch.optim.SGD(params, lr)

Then if I want to train the model using apex fp16, which operation is correct?

  1. Init all the sub-modules
[model, model_loss], optimizer = amp.initialize([model, model_loss], optimizer, opt_level="O1")
  1. Only init the main module
model, optimizer = amp.initialize(model, optimizer, opt_level="O1")

Since the params already contains all the parameter, it is able to train using both methods. But what will happen if I only include partial modules in the init operation?

We recommend to switch to the native mixed-precision training utility used via torch.cuda.amp as described in this doc.

Thanks for the reply.
Actually we switched from torch.cuda.amp to apex due to some internal reasons.
I studied the doc for apex, but found no instruction on this aspect. Could you please provide some suggestions regarding the above issue? Thanks!

The passed modules to amp.initialize will get patched based on the used opt_level. If you are not passing all modules to it, autocasting might not work and/or the code might break in another unexpected way.
Note that apex.amp won’t get any new features anymore, as the current development is focused on the native implementation.