Thanks!
The inputs for these two cases are the same, and the training strategy as well. The only difference between the two cases is showed as mentioned above. I’ve run these two cases many times, the difference does exist.
Is there any difference between the following code.
torch.nn.utils.clip_grad_norm(Model_All.parameters(),5)
optimizer_all.step()
and
torch.nn.utils.clip_grad_norm(Model_A.parameters(),5)
torch.nn.utils.clip_grad_norm(Model_B.parameters(),5)
optimizer_a.step()
optimizer_b.step()
If these two case have the same model state before this epoch, and the inputs for these two model are the same as well, can I get two models that are exactly the same after this epoch?