I try to use amp with pytorch1.6 to speed up my training code.
But I have a problem ,when I use nn.DataParallel.
I print some Intermediate variable.
I find the tensor is float16 in one gpu, but float32 in two gpus.
Is it support DataParallel model to use mixed-precision training?
Yes, amp is usable in nn.DataParallel as described here.
I guess you might have missed this note about the @autocast() decorator for the forward method, if you are using nn.DataParallel.
In cases where multi-GPU can be set by command line arguments, and the script thus has to work in single as well as multi-GPU scenarios, can we just put the autocast in forward for single GPUs as well?
Yes, that is correct.
The forward method has to be annotated for nn.DataParallel and DDP - multiple GPUs per process, not for single GPU or DDP - single GPU per process (recommended and fastest approach) use cases.
You should be able to do it anyway without any disadvantages, but let us know, if it’s breaking.