I encountered an overflow problem when deploying the GAN model(trained in fp32) to fp16 precision. And I decided to give it a try after seeing amp. I’m wondering how G and D are trained in amp context. (Only G needs to be deployed in fp16 while D is only used when training)
The linked doc shows an example how to use
amp for multiple models, losses, and optimizers.
Did you follow these steps or did you use a custom approach?