These two networks have the same structure, the last layer of resnet34 has been modified to output 3 values. For Network1, I split the network into 3 groups in order to do other operations later, but the architecture itself is identical.
class Network1(nn.Module):
def __init__(self):
super(Network1, self).__init__()
pretrained_model = resnet34(pretrained = True)
self.group1 = nn.Sequential(*list(pretrained_model.children())[0:6])
self.group2 = nn.Sequential(*list(pretrained_model.children())[6:8])
self.group3 = nn.Sequential(
nn.AdaptiveAvgPool2d(1),
Flatten(),
nn.Linear(512, 3)
)
def forward(self, image):
out = self.group3(self.group2(self.group1(image)))
return out
class Network2(nn.Module):
def __init__(self):
super(Network2, self).__init__()
pretrained_model = resnet34(pretrained = True)
self.group3 = nn.Sequential(
*list(pretrained_model.children())[0:8],
nn.AdaptiveAvgPool2d(1),
Flatten(),
nn.Linear(512, 3)
)
def forward(self, image):
out = self.group3(image)
return out
Therefore, I expected these two networks to train at a similar pace when trained, but oddly Network2 trains much faster than Network1.
The network outputs same values when I fed in a random noise, therefore I think there are problems with the training process. Any ideas?