[About CVPR best paper] What's matters need attention when dealing with nn.DataParallel

This question raised from the efficient implementation of DenseNet (CVPR 2017 best paper award), which can be found in here.

However, I found this implementation do not compatible with parallel computing, which means it can not be run in multiple GPU due to the RuntimeError: tensors are on different GPUs. Since only running this model in one GPU will work well, I was wondering what’s matter need attention to make code compatible with parallel computing.

By the way, the code of efficient implementation of densenet could be found here