Using dataparallel module to train GNMT

Hi all,

I found there is a way to use DistributedDataparallel (DDP) to train GNMT model. I want to ask whether we could use dataparallel (Optional: Data Parallelism — PyTorch Tutorials 1.8.0 documentation) module to train GNMT model? What is the difference when training GNMT.

Appreciate any help