Can i use model.half() to replace autocast when doing inference?

i have a trained model(mainly based on maskrcnn-benchmark, and modules include conv, linear, upsample) and use fp16 when doing inference. I used model.half() and patch_norm_fp32( borrowed from mmcv to stabilize bn) and it works well. So i wonder 1) can i just use the above method(i.e. model.half() and patch_norm_fp32) instead of autocast api? Or are there any differece between the two? 2) And i have done a few simple tests to find that autocast is faster than model.half(), is it normal?

torch.cuda.amp.autocast() uses an internal mapping of operations, which have to use FP32 for numerical stability as descrbed here. If your model works fine (and the accuracy doesn’t decrease), I don’t see an argument against using model.half().
No, I wouldn’t expect autocast to be faster.

Thank you for your answer, i will do some experiments to test the speed.