I have gone through the Introduction to Pytorch and searched information on backward() on the forum but an aspect of autograd remains unclear to me. Suppose we have two nets, A and B:

The way backward() is described, this would calculate gradients all the way back through net A. Am I right in thinking that B_solver.step() will not update the weights for A? If so, then what would be a natural way of updating the weights of A? We do not have an explicit loss function for A alone. Doing the obvious, we quickly run into the need to retain the computational graph (which is discouraged in the documentation):