The difference between optimize the model as a whole and two parts

Andybert · October 15, 2018, 1:08pm

Thanks!
The inputs for these two cases are the same, and the training strategy as well. The only difference between the two cases is showed as mentioned above. I’ve run these two cases many times, the difference does exist.
Is there any difference between the following code.

torch.nn.utils.clip_grad_norm(Model_All.parameters(),5)
optimizer_all.step()

and

torch.nn.utils.clip_grad_norm(Model_A.parameters(),5)
torch.nn.utils.clip_grad_norm(Model_B.parameters(),5)
optimizer_a.step()
optimizer_b.step()

If these two case have the same model state before this epoch, and the inputs for these two model are the same as well, can I get two models that are exactly the same after this epoch？