NVIDIA Model Overfitting to training data

Thank you for the pointers to the paper.

I did a data augmentation where for every image there is a transformed image with one of the 13 transforms from this list: [Random Color Jitter (Hue, Saturation, Contrast, Brightness), Horizontal Flip, Vertical Flip, Histogram Equalization, Auto Contrast, Adjust Shaprness, Solarize, Posterize , Invert, Affine Transformation, Random rotation , Random Perspective shift, Gaussian Blur]

However, the training still does not improve.

Next, I coded up the ResNet 50 architecture but seems like it will take a while to train (23 million parameters and train data size = 4,30,000). Will update after overnight of running on ResNet.