I’d just like to confirm that DDP would behave correctly in the following use case. If I’m not mistaken, upon `loss.backward()`

each parameter should accumulate the **combined** gradient from `loss_1`

and `loss_2`

, so the gradient synchronization should only happen once? If I’m correct, Does this extend to composite losses with an arbitrary number of components?

```
model = DDP(some_model)
training loop:
p1 = model(x1)
p2 = model(x2)
loss = loss_1(p1) + loss_2(p2)
loss.backward()
optimizer.step()
```

Quick update, I’ve read this thread and it seems this will not work. Please let me know if anything has changed since then.