Same architecture, but different training progress

These two networks have the same structure, the last layer of resnet34 has been modified to output 3 values. For Network1, I split the network into 3 groups in order to do other operations later, but the architecture itself is identical.

class Network1(nn.Module):
	def __init__(self):
		super(Network1, self).__init__()
		pretrained_model = resnet34(pretrained = True)

		self.group1 = nn.Sequential(*list(pretrained_model.children())[0:6])
		self.group2 = nn.Sequential(*list(pretrained_model.children())[6:8])
		self.group3 = nn.Sequential(
			nn.AdaptiveAvgPool2d(1),
			Flatten(),
			nn.Linear(512, 3)
		)
        
	def forward(self, image):
		out = self.group3(self.group2(self.group1(image)))
		return out
        
class Network2(nn.Module):
	def __init__(self):
		super(Network2, self).__init__()
		pretrained_model = resnet34(pretrained = True)

		self.group3 = nn.Sequential(
			*list(pretrained_model.children())[0:8],
			nn.AdaptiveAvgPool2d(1),
			Flatten(),
			nn.Linear(512, 3)
		)


	def forward(self, image):
		out = self.group3(image)
		return out

Therefore, I expected these two networks to train at a similar pace when trained, but oddly Network2 trains much faster than Network1.

The network outputs same values when I fed in a random noise, therefore I think there are problems with the training process. Any ideas?

The models should be identical and you’ve already tested for the same outputs, so I think the code is fine.
How often did you perform your experiments? Could it be the difference in the training was just randomly better or one model?