Training performance degrades with DistributedDataParallel

source code is pretty straightforward