I want to train the model with multiple branch.
When DDP model is trained with multiple backward (using retain_graph), it still fails to update weights.
(its gradients are exploaded.)
For example, I wrote the code below.
If DDP support multiple forward/backward, could you suggest correct usage?