Actually I am struggling for a long time with this problem and had probably tried more than 100 experiments.
I am working on an object detection model for medical imaging problems using the SSD architecture with various resnet backbone. I started and experimented with a lot of hyperparameters with a resnet34 backbone. Training loss curve seems to be okay in almost all the case but validation loss starts to increase even with
weight_decay = 0.1, which is really high value.
train_set comes out to be more than
90% but on
val_set the maximum which I have achieved using
35%. I see that model is heavily overfitting. I have used different augmentations that have been told to be okay by the medical experts.
Another weird thing about the loss curve which I observed is that, generally, validation classification loss starts to increase early and after training on more epochs localization loss also starts to increase.
Some other details:
Classification loss: Cross-Entropy with Hard Negative Mining
Localization Loss: Smooth L1 loss
I am sorry if this is not the correct forum but since I am using PyTorch, I thought maybe someone else in the community might have faced similar issues.