Questions about the memory function of backward()

Consider this situation, I combine network a and b’s parametrs into one optimizer, then for loss_a produced by a and loss_b produced by b, I do this:
err = loss_a+loss_b
err.backward()
optimizer.step()

Is it same as:
loss_a.backward()
optimizer_a.step()
loss_b.backward()
optimizer_b.step()

Thanks.

If loss_a and loss_b are not sharing the computation graph (besides the final addition to err in the first approach) and if the optimizers (in the second approach) are also not sharing any parameters, I think the approaches should be equal. At least I don’t see how they could interact.

1 Like