Does amp training can use torch.nn.DataParallel at the sametime?

I try to use amp with pytorch1.6 to speed up my training code.
But I have a problem ,when I use nn.DataParallel.

I print some Intermediate variable.
I find the tensor is float16 in one gpu, but float32 in two gpus.
Is it support DataParallel model to use mixed-precision training?

in one gpu:

fake_image_orig: torch.float16
gen loss: torch.float32
discriminator_out dtype: torch.float16
pred_fake: torch.float16
amp discriminor
discriminator_out dtype: torch.float16
self.get_zero_tensor(input) dtype: torch.float16
input dtype: torch.float16
self.get_zero_tensor(input) dtype: torch.float16

two gpus:

discriminator_out dtype: torch.float32
self.get_zero_tensor(input) dtype: torch.float32
input dtype: torch.float32
self.get_zero_tensor(input) dtype: torch.float32
input dtype: torch.float32
discriminator_out dtype: torch.float32
self.get_zero_tensor(input) dtype: torch.float32
input dtype: torch.float32
self.get_zero_tensor(input) dtype: torch.float32
input dtype: torch.float32
fake_image_orig: torch.float32
gen loss: torch.float32
discriminator_out dtype: torch.float32
pred_fake: torch.float32
fake_image_orig: torch.float32
gen loss: torch.float32
discriminator_out dtype: torch.float32
pred_fake: torch.float32
fake_image_orig: torch.float32
gen loss: torch.float32
fake_image_orig: torch.float32
gen loss: torch.float32
discriminator_out dtype: torch.float32

Yes, amp is usable in nn.DataParallel as described here.
I guess you might have missed this note about the @autocast() decorator for the forward method, if you are using nn.DataParallel.

In cases where multi-GPU can be set by command line arguments, and the script thus has to work in single as well as multi-GPU scenarios, can we just put the autocast in forward for single GPUs as well?

In face, I try to experiment in another machine with pytorch-nightly.
I find it is work!
It is weird.

Yes, that wouldn’t be necessary, but shouldn’t hurt either.

1 Like

Just to be clear: it is necessary in the DP cases, but not so in single GPU scenarios?

Yes, that is correct.
The forward method has to be annotated for nn.DataParallel and DDP - multiple GPUs per process, not for single GPU or DDP - single GPU per process (recommended and fastest approach) use cases.
You should be able to do it anyway without any disadvantages, but let us know, if it’s breaking.

1 Like

It means I need use amo.autocast in eyery single forward for nn.DataParallel and DDP in multiGPUs