Can i use model.half() to replace autocast when doing inference?

treasureqqm · July 28, 2021, 2:54am

i have a trained model(mainly based on maskrcnn-benchmark, and modules include conv, linear, upsample) and use fp16 when doing inference. I used model.half() and patch_norm_fp32( borrowed from mmcv to stabilize bn) and it works well. So i wonder 1) can i just use the above method(i.e. model.half() and patch_norm_fp32) instead of autocast api? Or are there any differece between the two? 2) And i have done a few simple tests to find that autocast is faster than model.half(), is it normal?

ptrblck · July 28, 2021, 4:17am

torch.cuda.amp.autocast() uses an internal mapping of operations, which have to use FP32 for numerical stability as descrbed here. If your model works fine (and the accuracy doesn’t decrease), I don’t see an argument against using model.half().
No, I wouldn’t expect autocast to be faster.

treasureqqm · July 28, 2021, 6:25am

Thank you for your answer, i will do some experiments to test the speed.