In the following setup all models are nn.Modules, not necessarily stored in different interfaces. What would be a correct way of processing the following setup, in case all models/modules must be updated:
Specifically, do I need to detach and clone o1 before using it in o3, or maybe before o2?
EDIT: the reason for asking is when I checked the weights after training, I got an impressions some weights in m3 are not updated/frozen. I checked: all weights have requires_grad=True
Basically, m3 will also have gradients after your loss.backward().
You can try the following code that I try to simulate your situation and you will got gradients on m3: