Mixed precision with torch.cuda.amp

shrutee · April 1, 2020, 10:29am

I have an array of models, corresponding optimizers and losses. Without mixed precision, I sum the losses and then apply backward.

(loss1 + loss2 + ...).backward()

Now while doing mixed precision training with torch.cuda.amp, acc to the sample code given I should apply backward on individual losses.

scaler.scale(loss1).backward()
scaler.scale(loss2).backward() and so on.

So I wanted to know what is the correct way - applying backward() on individual losses or on sum total? Is the backward func linear?

Code for reference

scaler = torch.cuda.amp.GradScaler()

for epoch in epochs:
    for input, target in data:
        optimizer0.zero_grad()
        optimizer1.zero_grad()
        with autocast():
            output0 = model0(input)
            output1 = model1(input)
            loss0 = loss_fn(2 * output0 + 3 * output1, target)
            loss1 = loss_fn(3 * output0 - 5 * output1, target)

        scaler.scale(loss0).backward(retain_graph=True)
        scaler.scale(loss1).backward()

        # You can choose which optimizers receive explicit unscaling, if you
        # want to inspect or modify the gradients of the params they own.
        scaler.unscale_(optimizer0)

        scaler.step(optimizer0)
        scaler.step(optimizer1)

        scaler.update()

ptrblck · April 2, 2020, 12:52am

Yes, you can also sum the losses and call backward once.