Suppose I have two networks A, B in sequence. Two different loss functions are applied on the resulting features, i.e. in forward pass I have:

```
y = A(x)
z = B(y)
loss1 = loss_func1(z)
loss2 = loss_func2(z)
```

loss1 only update network A, and loss2 update both network A and B.

I have optimizer for both A and B independently. Now what I do is:

```
optim_A.zero_grad()
(loss1 + loss2).backward()
optim_A.step()
```

```
optim_B.zero_grad()
loss2.backward()
optim_B.step()
```

But I think, when updating B, backward pass goes all the way back until the input x. Am I right?

What if I just want the backward to stop at y to save calculation time in case A is very complicated? Detach is not useful because in loss1 we still need the backward to go through A.

Thanks!