Focal loss NLP/text data pytorch - improving results

I have a NLP/text data classification problem where there is a very skewed distribution - class 0 - 98%, class 1 - 2%
For my training and validation data I am doing oversampling and my class distribution is class 0 - 55%, class 1 - 45%.
The test data has skewed distribution

i built a model using nn.BCEWithLogitsLoss(pos_weight=tensor(1.2579, device='cuda:0')) . pos_weight was calculated using 55/45 (class distribution in training data.)

and on my class 1 of test data I got f1 performance of 0.07,
true negatives, false positives, false negative, true positive = (28809, 13258, 537, 495)

I changed to focal loss using below code and my performance didnt improve a lot. f1 on class 1 of test data is still same and
true negatives, false positives, false negative, true positive = (32527, 9540, 640, 392)

kornia.losses.binary_focal_loss_with_logits(probssss, labelsss,alpha=0.25,gamma=2.0,reduction='mean')

  1. are my alpha and gamma parameters wrong? Are there any specific values that I should try? I could try to tune them but it might take a lot of time and resources. therefore I am looking for recommendations
  2. for my nn.BCEWithLogitsLoss(pos_weight=tensor(1.2579, device='cuda:0')) should I use any other value for pos_weight? Please remember that my goal is to get maximum f1 performance for test data class 1