Suppose I have two networks A, B in sequence. Two different loss functions are applied on the resulting features, i.e. in forward pass I have:
y = A(x) z = B(y) loss1 = loss_func1(z) loss2 = loss_func2(z)
loss1 only update network A, and loss2 update both network A and B.
I have optimizer for both A and B independently. Now what I do is:
optim_A.zero_grad() (loss1 + loss2).backward() optim_A.step()
optim_B.zero_grad() loss2.backward() optim_B.step()
But I think, when updating B, backward pass goes all the way back until the input x. Am I right?
What if I just want the backward to stop at y to save calculation time in case A is very complicated? Detach is not useful because in loss1 we still need the backward to go through A.