DataParallel imbalanced memory usage

albanD · August 8, 2018, 4:30pm

If you want the loss to be splitted among gpus, just make your loss layer part of the DataParallel and add a sum or mean operation on what you get out of it. That way if you use DataParallel on 4 devices, only 4 extra numbers will be allocated on the output_device. Is that a good solution for you?