Multiple model instances with common weight updates

I have a model called CombinedModel() which itself consists of 3 smaller blocks (named submodel1, submodel2, submodel3).

I created multiple instances of this CombinedModel() as so:

model1 = CombinedModel()
model2 = CombinedModel()
model3 = CombinedModel()

  1. When checking the layer weights individually by calling them as model1.submodel1.layer[0].weight I observed that the weights in all three model instances are the same. Is this purely due to having a similar random initialization with the same random seed?

  2. I proceeded to train only the first model model1:

def train_model(model,optimizer):

## Loop over epochs
    model.train()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# Call instances of model and optimizer
model1 = CombinedModel()
optimizer = torch.optim.Adam(model1.parameters())

# Run function
train_model(model1,optimizer)

After doing so I checked the layer weights of all 3 models and found that the weights were again the same. How does training model1 independently affect the other weights as well?

model1.submodule1.layer[0].weight
model2.submodule1.layer[0].weight
model3.submodule1.layer[0].weight

The above three are all the same, despite model2 and model3 not being trained.

Even when I create a new model instance (model4 = CombinedModel()) after finishing model1 training, model4 also has the same weights as the other 3 models.

Why is it that each instance has the same weights? And how can I avoid this weight dependency among instances?

Not sure, but the problem might be arising in how the CombinedModel() is created.

class CombinedModel(nn.Module):

	def __init__(self,mod1 = submodule1(), mod2 = submodule2(), mod3 = submodule3()):
		super().__init__()

		self.mod1 = mod1
		self.mod2 = mod2
		self.mod3 = mod3

	def forward(self,x):
		
		x = self.mod3(self.mod2(self.mod1(x)))

		return x

Is the fact that I am calling the submodule instances inside the creation of the combined module class making it point to the same weight tensors?

@ptrblck any chance you could have a look when you get the time?

Hey @utsavdutta98
Could you share a colab notebook of sorts where the whole snippet would be easily accessible?

I would really want to look at the submodule class.

@ariG23498 Yeah sure!
Colab Notebook

The first three cells contain the modules, and the final one combines them.

I have printed the weight tensors of each afterwards too.

Thanks for your help :slight_smile:

Hey @utsavdutta98
I can confirm that the way the CustomModel is created makes all the difference.

I made this GIST to help you find the difference.

It is a wise choice to not create an object through functional arguments.

I did a quick python snippet to investigate the issue:

class A:
    def __init__(self):
        pass

class B:
    def __init__(self, a=A()):
        self.a = a

###############################
obj1 = B()
obj1.a

<__main__.A at 0x7f315079e910>


obj2 = B()
obj2.a
<__main__.A at 0x7f315079e910>

They are the same object. Hope this helps! :grinning_face_with_smiling_eyes:

@ariG23498 ah that clears it up!

I’m not too strong in OOP theory, is there any underlying reason why assigning the variable inside the functional argument makes all instances point to the same object?

Either way, I will definitely keep this in mind from a practical standpoint.
Thanks a lot :slight_smile: