Why does ResNet50 training on down sampled TinyImageNet not converge?

Hi all, I am trying to train a ResNet50 for scratch on down sampled TinyImageNet dataset. Unfortunately my network dose not converge on test data. I use the following transformer and optimizer to train the model. Any advice would be really appreciated.

transforms = {
    "train": transforms.Compose([
        transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])]),
       "test": transforms.Compose([
        transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])]),}

trainset= datasets.ImageFolder(os.path.join(data_dir, "train"), transforms["train"])
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,shuffle=True, num_workers=0)

testset= datasets.ImageFolder(os.path.join(data_dir, "test"), transforms["test"])
testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,shuffle=False, num_workers=0)

optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-2) 
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)

There is no particular answer to be given here, as you aren’t doing anything particularly wrong. Can you share your training and test loss curves?

Also, here’s some generic advice from GPT-4 below that might help:

If your ResNet50 model is not converging on the test data using the TinyImageNet dataset, there are several aspects you can consider to improve performance:

  1. Data Augmentation:

    • Your current training transformations include resizing, random horizontal flip, and normalization. Consider adding more augmentations like random rotation, color jitter, and random crop. This can help the model generalize better.
    • Ensure that the test set transformations do not include any data augmentation techniques (like flipping or rotation), as you’ve correctly done. The test set should reflect real-world data.
  2. Learning Rate and Optimizer:

    • The initial learning rate of 0.01 might be too high or too low depending on the specific problem. Consider using a learning rate finder to identify a more suitable initial learning rate.
    • Try using different optimizers like Adam or AdamW, which might perform better in some scenarios compared to SGD.
  3. Batch Size:

    • Depending on your hardware capabilities, try experimenting with different batch sizes. A larger batch size can provide more stable gradient estimates, but be mindful of memory constraints.
  4. Weight Initialization:

    • Check if you are using an appropriate weight initialization method. Sometimes, using pre-trained weights as a starting point and fine-tuning the model on your dataset can yield better results.
  5. Network Architecture:

    • Ensure that the architecture is correctly implemented. If you are not using a pre-built model from a framework like PyTorch, double-check the layers and connections.
    • ResNet50 is a deep network. If the TinyImageNet dataset is significantly simpler than ImageNet, a less complex model might be more appropriate.
  6. Regularization:

    • You are using weight decay in your optimizer, which is good for regularization. Also, consider implementing dropout layers if overfitting is observed.
  7. Learning Rate Scheduler:

    • The CosineAnnealingLR scheduler you are using is generally a good choice. However, you might want to experiment with other schedulers or adjust the T_max parameter based on the number of epochs you are training.
  8. Loss Function:

    • Ensure you are using an appropriate loss function for your classification task, typically CrossEntropyLoss for multi-class classification.
  9. Model Evaluation:

    • While training, monitor not only the loss but also other metrics like accuracy, precision, recall, and F1-score. This gives a better understanding of how well your model is performing.
  10. Early Stopping:

    • Implement early stopping to prevent overfitting. This stops the training once the model performance ceases to improve on a validation set.

Remember, model convergence and performance can be a trial and error process, requiring iterative adjustments and experiments.

1 Like

The image resolution in TinyImageNet is 64x64 right? I am wondering why you are further downscaling it.

Also, I would suggest to rescale the image to 224x224 just to avoid issues from image scales.