Why does my training loss of ResNet50 not converge?

Hi all. I’m currently a PhD student at Duke University and I’m using Pytorch to conduct my research. It’s nice to find such a great forum!

I define the following network architecture where model_resnet50_bn is a pre-trained ResNet50 with Batch Normalization layers in between.

class MyPipeline(nn.Module):
    def __init__(self, image_size, transformed_meteo_size, num_classes=1000):
        super(MyPipeline, self).__init__()
        self.resnet_pretrained = model_resnet50_bn
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        in_features = 2048
        self.fc = nn.Linear(in_features, num_classes)
        self.dropout = nn.Dropout(p=0.6)
        self.elu = nn.ELU()
        self.fc1 = nn.Linear(self.resnet_pretrained.fc.out_features+transformed_meteo_size, 300)
        self.fc2 = nn.Linear(300, 1)
    def forward(self, image, transformed_meteo_features):
        img_features = self.resnet_pretrained(image)
        img_features = self.avgpool(img_features[0])
        img_features = torch.flatten(img_features, 1)
        img_features = self.fc(img_features)
        # Concatenate image representations with transformed meteo features
        x = torch.cat((img_features, transformed_meteo_features), dim=-1)
        x = self.dropout(x)
        x = self.fc1(x.float())
        x = self.elu(x)
        x = self.fc2(x)
        return x

The last FC layer fc2 will output a float, and I have another target float number so this is a regression task. I use nn.MESLoss() and Adam optimizer with learning rate of 0.0001 to train this network, but I find that the MSE loss does not converge after running 500 training epochs. I adjust the learning rate several times but the problem still exists. Does this have something to do with the network architecture or some unexpected behavior of the autograd package? Any help will be appreciated!


I don’t think there is any reason for the autograd to fail here.
You might want to make sure that you preprocess your data properly though to avoid any large value as it could hinder training.
For the architecture, it is very task-dependent I’m afraid. But for vision related tasks a pre-trained resnet should like a good idea to me.

Hi @albanD,

Thank you so much for your suggestion! For image preprocessing, I currently have only 2 steps: 1. Center-crop the image to be 110x110, 2. Normalize the image with transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)). Does this look ok or do you think I may need to adjust the normalization parameters? I do not use Pytorch very often so I’m not quite clear what parameters I should use for normalization of different types of images.

Hi @albanD

Do we have any built-in preprocessing function in Pytorch corresponding to tf.keras.applications.resnet.preprocess_input? Thanks for helping in advance!

I think the parameters for that transform should be taken from your dataset statistics, not random values. That does not sound right.
This is not specific to PyTorch though you would need similar preprocessing for many ML framework.

Do we have any built-in preprocessing function in Pytorch corresponding to tf.keras.applications.resnet.preprocess_input?

I guess that would be the proprocessing when using images from imagenet no? Do your images have the same distributions? I think you need to check that and make sure they do by doing the right normalization.

Hi @albanD

I think my images are not very similar to ImageNet dataset because they are 334x334 satellite images so there exists strong spatial relationship. Also, my ResNet50 is not pretrained on ImageNet but using some self-supervised learning framework.