Resnet has a 4 component: layer1, layer2, layer3, layer4. Hypothesis one：
And I initialize the layer4 use the same parameters, just like this: self.layer4_1 = model_resnet.layer4 self.layer4_2 = model_resnet.layer4
That means the layer4_1 and layer4_2 pointing to the same parameters, they share the same parameter. I update them alternately.

Hypothesis two：
I just define one layer4 just like this: self.layer4 = model_resnet.layer4
And I update the layer4 two times than Hypothesis one.

I want to ask what’s the difference between the two hypothesis？Why are the model results different when the model converges？

My English is poor, if you are chinese, we can talk in chinese.

Hello and welcome I Like your post and well done with the images.

You stated that we point to the same layer in hypothesis 1/top image. Whenever we update one of the boxes, the other box is also updated, right?

Then in hypothesis 2: Every time you update the box, that’s the same as updating one of the boxes in hypothesis one - as they are connected. You say that in hypothesis 2, the update happens twice so there are double the amount of updates compared to hypothesis 1, right?

In hypothesis 1/top image, the layer4_1 and layer4_2 share the same parameters, which is the same in layer4 in hypothesis 2/down image.

I update layer4_1 and layer4_2 alternatively in hypothesis 1/top image and I update layer4 in hypothesis 2/down image twice to get the the amount of updates compared to hypothesis 1.

Then the results should be the same as you suggest, at least to my understanding.

I would make sure that the models are the same after 1 and 2 update steps just as a sanity check. Maybe Pytorch does something funky like a copy when you set two modules to the same, I’m not sure