apex.amp is deprecated and you should use the native mixed-precision utility via torch.cuda.amp as described here.
With that being said, yes, it’s possible to activate autocast only for the inference. The docs give you some examples and in particular you can skip the training utils (e.g. the GradScaler).