AMP during inference

Hi,
Does AMP speed up inference, or is it just for training? I used to think the reduced precision was only for gradients (and therefore irrelevant for inference) but I’ve seen mentions of choosing other implementations for the forward too, which sounds like it might be designed for inference-only usecases too.

Secondly, just to make sure - AMP might cause a slight decrease in accuracy compared to fp32, but considerably less than moving both the model and the data to fp16?