Hi, I’m interested in PiPPy package’s pipeline parallelism capability. As I’m exploring the doc, I couldn’t find instruction on how mixed precision training is supported with PiPPy. Say if I want to train a model in fp16 using pipeline parallelism, how should I do it correctly?
I would expect that AMP would be orthogonal to PiPPy. cc: @kwen2501
In theory I would also expect them to be orthogonal. But as PiPPy will do a lot additional things like graph tracing etc, I just wonder how compatible are they in practice?