@apaszke @ptrblck I just had a small follow up, little different query.
If model1 gives an output a which is then given to model2 and maybe model3 also later on and we want to make sure that different models are updated based on different losses then how does .detach() work.
Let’s say a.detach() is the input to model2 so if we calculate loss and do .backward() gradients will only be calculated for model2 not model1.
But then in the next step when a is given to model3 and the new loss is done .backward() on does the gradient get calculated for model3 as well as model1 or just model3?
Basically, does calling .detach() return a copy of the original output ‘a’ or does it detach the original a from computation chain?