apex.amp
is deprecated and you should use the native mixed-precision utility via torch.cuda.amp
as described here.
With that being said, yes, it’s possible to activate autocast
only for the inference. The docs give you some examples and in particular you can skip the training utils (e.g. the GradScaler
).