Validation loss doen't decrease, then starts to overfit

I am training a model to predict whether there are abnormalities in CT images, the input is CT image and the output is the probability. I use resnet18 as my backbone network and perform data preprocessing as https://pytorch.org/vision/main/models/generated/torchvision.models.resnet18.html suggests.
For loss function, I use focal loss with its parameters gamma=2, alpha=0.5 since there are almost 95% negative samples in my training dataset.
At the begining, the validation loss doesn’t decrease and there is a large gap between training loss and validation loss even without dropout or data augmentation. I am sure I use same loss function during validation. I have tried reduced the number of layers in the decoder part of my model, but it seems to underfit. Does anyone know what’s going on with this situation?