I have binary NLP classification problem and my data is very biased. Class 1 represents only 2% of data. For training I am oversampling from class 1 and for training my class distribution is 55%-45%. I have built a CNN. My last few layers and loss function as below
self.batch_norm2 = nn.BatchNorm1d(num_filters)
self.fc2 = nn.Linear(np.sum(num_filters), fc2_neurons)
self.batch_norm3 = nn.BatchNorm1d(fc2_neurons)
self.fc3 = nn.Linear(fc2_neurons, 1)#changing on 6March - BCE with logits loss
Loss:
BCE_With_LogitsLoss=nn.BCEWithLogitsLoss(pos_weight=class_examples[0]/class_examples[1])
In my evaluation function I am calling that loss as follows
loss=BCE_With_LogitsLoss(torch.squeeze(probs), labels.float())
I was suggested to use focal loss over here.
Please consider using Focal loss:
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár Focal Loss for Dense Object Detection (ICCV 2017).
Is there any pytorch implementation of the same? I found few but now sure which are correct.
one example - https://stackoverflow.com/questions/71300607/using-focal-loss-for-imbalanced-dataset-in-pytorch
another example - https://stackoverflow.com/questions/66178979/focal-loss-implementation
My questions:
- is there any specific implementation that I should use?
- Should I use focal loss even though I am oversampling?
- How to modify my code to use the correct implementation
- do we need to use
pos_weight
along with it?