Call half() on selected model weights

sc21 · December 27, 2021, 7:07am

Hi,

I am trying to convert some of the trained model weights to fp16 (for inference use only). The model was trained with fp32, but I’d like to convert some layers to fp16, hoping to increase the inference speed. I cannot do half() on the whole model since there are some layers that need to be in fp32, e.g. layernorm.

Can I iterate over the model’s named parameters, something like e.g.

for name, param in model.named_parameters():
  if name is in converted_list:
     model.named_parameters[name].half()

Is this something which might work or there is anything I am unaware of?

Thanks!

ptrblck · December 28, 2021, 8:51pm

Your approach won’t work, as calling half() of parameters will not be applied inplace and thus the model won’t change (after fixing the indexing issue as named_parameters is a method).
Instead, call half() on the layers or use the mixed-precision util via torch.cuda.amp.autocast.

sc21 · December 28, 2021, 11:29pm

Thanks a lot! Can torch.cuda.amp.autocast be used to change the model dtype for inference? We want to convert some the layer weights and export the model to Onnx later, so wonder if there is a way to change the model weights instead of just using amp for inference.

After reading this, my understanding is that apex.amp can be used to convert some model weights to fp16 and generate a new model, while torch.cuda.amp.autocast cannot?

ptrblck · December 29, 2021, 9:11pm

torch.cuda.amp.atocast will not change the dtype of the model parameters and if you explicitly want to use float16 in the parameters (without the master copies as was done in the deprecated apex.amp O2 level), they you would indeed need to call half() on the desired layers.