What is the pytorch1.0 way to train or eval with multi-gpus?

I noticed that among the new features of pytorch1.0, there is an improved-distributed feature. Does this have anything to do with training or evaluating on a same local machine with multi-gpus, or is it only associated with cross-machine scenes? If it plays the same role as nn.DataParallel, what is the pytorch1.0 way to write the code as nn.DataParallel does ?


Yes is mainly focussed on improving multimachine via DistributedDataParallel.
All the relevant improvements have also been added to the DataParallel module.
So multi-gpu (split on data) should still be done with DataParallel on the same machine !