Questions about the memory function of backward()

lovepytorch · October 27, 2021, 1:40am

Consider this situation, I combine network a and b’s parametrs into one optimizer, then for loss_a produced by a and loss_b produced by b, I do this:
err = loss_a+loss_b
err.backward()
optimizer.step()

Is it same as:
loss_a.backward()
optimizer_a.step()
loss_b.backward()
optimizer_b.step()

Thanks.

ptrblck · October 27, 2021, 4:42am

If loss_a and loss_b are not sharing the computation graph (besides the final addition to err in the first approach) and if the optimizers (in the second approach) are also not sharing any parameters, I think the approaches should be equal. At least I don’t see how they could interact.