Training with multiple copies of a model

Hi,

How can I correctly create multiple copies of a model and backpropagate the loss correctly? So I have a model and I am simply did the following:

model1 = model
model2 = model
model3 = model

Firstly, is this correct or should I use deepcopy (or how is deepcopy different from the above)?

Secondly, after I create/copy model1, model2, and model3, when I execute the line:

model.to(device)

I notice that all the four models are moved to device. So if I move the model to gpu, the other 3 models also moved to gpu and same for cpu. Why is that?

Thirdly, I try to delete the model (or the copied models) by calling del model, but it is still not deleted and 'model' in locals() return True (same True is returned for other models even when I delete them via del). Any explanation for this behavior?

Lastly, and importantly, I want to compute the loss of each model independently. Something like:

loss1 = criterion(outputs1, labels)
loss2 = criterion(outputs2, labels)
loss3 = criterion(outputs3, labels)

How can I aggregate the loss (is loss = loss1 + loss2 + loss3 okay?) such that this total loss is backpropagated to all the models. Also, how to call _.backward() and _.step() functions in this case.

Any help on this will be much appreciated.

because model1, model2, model3 and model are referring to the same object. You have to use copy.deepcopy() to create an independent copy of the model. your second, question would be solved with copy.deepcopy()

yes, you can compute

loss1 = criterion(outputs1, labels)
loss2 = criterion(outputs2, labels)
loss3 = criterion(outputs3, labels)
loss = loss1 + loss2 + loss3
loss.backward()

regarding optimizer.step(), you first have to add all your models parameters to the optimizer before calling the step.

torch.optim.SGD(list(model1.parameters()) + list(model2.parameters() + list(model3.parameters())