If you want the loss to be splitted among gpus, just make your loss layer part of the DataParallel and add a sum
or mean
operation on what you get out of it. That way if you use DataParallel on 4 devices, only 4 extra numbers will be allocated on the output_device. Is that a good solution for you?
6 Likes