Training Conv nets with half precision

I wonder if the devs have any specific advices on training with half precision?

I converted my model to run with cuda().half(), but it seems to not be able to converge.

Is there something I should be aware of?

Thank you!

This week Amazon has set up AWS P3 instances with Tesla V100 cards, which support half-precision training, so I am awakening this old topic.

I only have experience with fp training on Titan X. If someone has insight on how to train with half-precision on Tesla V100 or P100 cards, please share them with us!

@FuriouslyCurious, did you manage to run anything at all on the v100?

Please see here for tips on training with mixed precision

Those tips are very interesting. Has anyone got examples of implementing that in pytorch? Are there any examples of successfully training in half precision with pytorch - especially of standard architectures like resnet?

