Yes you can See Uneven GPU utilization during training backpropagation - #14 by colllin for an example wrapping the loss function with DataParallel
Yes you can See Uneven GPU utilization during training backpropagation - #14 by colllin for an example wrapping the loss function with DataParallel