I am trying to convert some of the trained model weights to fp16 (for inference use only). The model was trained with fp32, but I’d like to convert some layers to fp16, hoping to increase the inference speed. I cannot do half() on the whole model since there are some layers that need to be in fp32, e.g. layernorm.
Can I iterate over the model’s named parameters, something like e.g.
for name, param in model.named_parameters():
if name is in converted_list:
Is this something which might work or there is anything I am unaware of?
Your approach won’t work, as calling
half() of parameters will not be applied inplace and thus the model won’t change (after fixing the indexing issue as
named_parameters is a method).
half() on the layers or use the mixed-precision util via
Thanks a lot! Can
torch.cuda.amp.autocast be used to change the model dtype for inference? We want to convert some the layer weights and export the model to Onnx later, so wonder if there is a way to change the model weights instead of just using amp for inference.
After reading this, my understanding is that apex.amp can be used to convert some model weights to fp16 and generate a new model, while
torch.cuda.amp.atocast will not change the
dtype of the model parameters and if you explicitly want to use
float16 in the parameters (without the master copies as was done in the deprecated
O2 level), they you would indeed need to call
half() on the desired layers.