Is it possible to enable apex (opt_level=O1) only while inferencing on a model, purely trained on FP32 (without apex)

apex.amp is deprecated and you should use the native mixed-precision utility via torch.cuda.amp as described here.

With that being said, yes, it’s possible to activate autocast only for the inference. The docs give you some examples and in particular you can skip the training utils (e.g. the GradScaler).