I have never done it. If you call another torch module, let’s say that module computes forward and backward passes, and it has its own optimization sub-routine. Let’s say that a task can be seen as a parallel task. How can you do that becase? Becase secondary backward step, I think that will trigger the entire backward pass.
For example, in my case, inside a training loop, I have a secondary computation task solver Ax=b (oversimplified example). So for the main optimization, i.e., inside the training loop, (for the sake of simplicity) we solve Ax=b, and it does not influence gradient flow back to the main optimizer.
loss() → let say inside a loss, it computes backward() somewhere here.-> the oput of that not s.t