I have an array of models, corresponding optimizers and losses. Without mixed precision, I sum the losses and then apply backward.

`(loss1 + loss2 + ...).backward()`

Now while doing mixed precision training with `torch.cuda.amp`

, acc to the sample code given I should apply backward on individual losses.

`scaler.scale(loss1).backward()`

`scaler.scale(loss2).backward()`

and so on.

So I wanted to know what is the correct way - applying `backward()`

on individual losses or on sum total? Is the backward func linear?

**Code for reference**

```
scaler = torch.cuda.amp.GradScaler()
for epoch in epochs:
for input, target in data:
optimizer0.zero_grad()
optimizer1.zero_grad()
with autocast():
output0 = model0(input)
output1 = model1(input)
loss0 = loss_fn(2 * output0 + 3 * output1, target)
loss1 = loss_fn(3 * output0 - 5 * output1, target)
scaler.scale(loss0).backward(retain_graph=True)
scaler.scale(loss1).backward()
# You can choose which optimizers receive explicit unscaling, if you
# want to inspect or modify the gradients of the params they own.
scaler.unscale_(optimizer0)
scaler.step(optimizer0)
scaler.step(optimizer1)
scaler.update()
```