DistributedDataParallel on multiple GPU nodes slower than one GPU node

mrshenli (Shen Li) April 10, 2020, 6:24pm 2

Two questions,

Did you divide the the epoch size on each process by world_size?
Will there be any contention on the data loader?

Also cc @zhangguanheng66 for transformer questions