Mixed Precision to modify Models for Mixed Precision Training

I’ve currently written a model that works on float32. However, I’d like to try mixed precision training due to uses not having enough memory. However, after following this page: https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html, and adding the grad scaler and with autocast, I noticed that no memory was being saved, and when checking params within the autocast while section of my code, I noticed it was all fp32. Are there any other steps I need to take to modify the model code to work with mixed precision, and if necessary, could you link resources with steps on how to modify my model?

The parameters will still be stored on FP32 and the memory saving might come from the activations, which could be stored in FP16 (if the operation is save to be used in FP16).
The amp example would be a good starter and shows how to use autocast.

Thanks for your response! If I wrote a custom loss function that only takes fp.32, should I use the normal backwards and step instead of the scaling the loss?

No, the loss scaling is applied to avoid underflows in the gradient calculation, so you should still use it.

Should I convert the loss function’s outputs (which are in fp32) back down to fp16, then apply scaling?

No, you can keep them in FP32.