Your code should work and both models as well as optimizers would be independent.
However, if you don’t have any randomness in the models and are also feeding the same data as well as target into them, the training might be equal (unless non-deterministic operations are used internally with an undefined order of ops).
Here is a small example:
def check_models(model1, model2):
for (name1, param1), (name2, param2) in zip(model1.named_parameters(),
model2.named_parameters()):
print('name {}, is equal {}'.format(
name1, torch.allclose(param1, param2)))
def update_step(model, optimizer, x, y):
optimizer.zero_grad()
output = model(x)
loss = F.mse_loss(output, y)
loss.backward()
optimizer.step()
model1 = nn.Sequential(
nn.Linear(10, 10),
nn.ReLU(),
#nn.Dropout(),
nn.Linear(10, 10)
)
model2 = copy.deepcopy(model1)
check_models(model1, model2)
lr = 1e-3
optimizer1 = torch.optim.SGD(model1.parameters(), lr)
optimizer2 = torch.optim.SGD(model2.parameters(), lr)
for i in range(2):
x = torch.randn(10, 10)
y = torch.randn(10, 10)
update_step(model1, optimizer1, x, y)
update_step(model2, optimizer2, x, y)
check_models(model1, model2)
As you can see, the updated parameters will be the same unless I enable the dropout layer in the model.