I am currently using Distributed Data Parallel to achieve multi GPU training. So far, I did not need to send data across GPUs, because I could make use of the fact that in the backward pass the gradients are gathered from all GPUs before updating the models on the different GPUs automatically by the Distributed Data Parallel class.
However, now, I would like to extend / change my loss function with a calculation that requires the data on all GPUs. So, I would like to manually / explicitly exchange tensors between GPUs before calling backward. Let’s say the loss is calculated from two terms:
- One term, like a reconstruction loss term, that can be calculated on all GPUs separately and may be treated as normal, or additive in terms of gradients.
- A second term, like a statistic on all samples in batches across different GPUs, for which I would like to have the GPUs sync / exchange data, before doing a loss calculation and backward pass.
So, I need to know A) how to achieve this data passing for the second term, but B) also how to compute the loss and perform a backward pass with these two types of losses together.
Any help is much appreciated!