SSD object detection model overfitting even with very high weight decay

Actually I am struggling for a long time with this problem and had probably tried more than 100 experiments.

I am working on an object detection model for medical imaging problems using the SSD architecture with various resnet backbone. I started and experimented with a lot of hyperparameters with a resnet34 backbone. Training loss curve seems to be okay in almost all the case but validation loss starts to increase even with weight_decay = 0.1, which is really high value. voc_map on train_set comes out to be more than 90% but on val_set the maximum which I have achieved using resnet34 is 35%. I see that model is heavily overfitting. I have used different augmentations that have been told to be okay by the medical experts.

Another weird thing about the loss curve which I observed is that, generally, validation classification loss starts to increase early and after training on more epochs localization loss also starts to increase.

Some other details:

Optimizer: Amsgrad
Classification loss: Cross-Entropy with Hard Negative Mining
Localization Loss: Smooth L1 loss

I am sorry if this is not the correct forum but since I am using PyTorch, I thought maybe someone else in the community might have faced similar issues.