How to train two identical models in pytorch?

dgjung · July 5, 2021, 8:24am

Hi,

I have a question about training in pytorch.
There is a pseudo code that I wanna ask.

model1 = Net()
model2 = copy.deepcopy(model1)

optimizer1 = torch.optim(model1.parameters(), lr)
optimizer2 = torch.optim(model2.parameters(), lr)

for i in range(dataset):

    x = random(dataset[i])
    y = dataset[i]

    update_step1(model1, optimizer1, x, y)
    update_step2(model2, optimizer2, x, y)

I thought this would be updated separately in each function(update_step1, 2), but the two values were the same.

How should I do to train model at the same time?
Note that the reason for doing this at the same time is to give the same random X value both models.

Thank you.

ptrblck · July 7, 2021, 5:59am

Your code should work and both models as well as optimizers would be independent.
However, if you don’t have any randomness in the models and are also feeding the same data as well as target into them, the training might be equal (unless non-deterministic operations are used internally with an undefined order of ops).
Here is a small example:

def check_models(model1, model2):
    for (name1, param1), (name2, param2) in zip(model1.named_parameters(),
                                                model2.named_parameters()):
        print('name {}, is equal {}'.format(
            name1, torch.allclose(param1, param2)))

def update_step(model, optimizer, x, y):
    optimizer.zero_grad()
    output = model(x)
    loss = F.mse_loss(output, y)
    loss.backward()
    optimizer.step()


model1 = nn.Sequential(
    nn.Linear(10, 10),
    nn.ReLU(),
    #nn.Dropout(),
    nn.Linear(10, 10)
)
model2 = copy.deepcopy(model1)

check_models(model1, model2)

lr = 1e-3
optimizer1 = torch.optim.SGD(model1.parameters(), lr)
optimizer2 = torch.optim.SGD(model2.parameters(), lr)


for i in range(2):
    x = torch.randn(10, 10)
    y = torch.randn(10, 10)

    update_step(model1, optimizer1, x, y)
    update_step(model2, optimizer2, x, y)
    check_models(model1, model2)

As you can see, the updated parameters will be the same unless I enable the dropout layer in the model.