I used BCEWithLogitsLoss for the loss function (the output of the model is logits) for a multilabel classification (most classes are negative while only a few are positive). During training, I measured the F1 score for train and validation dataset (converted logits to sigmoid outputs). I have ~10 linear layer. The following is the loss and F1 scores for 100 epochs:
As it is shown, the train loss jumps up and down after ~30 epochs. I added L2 penalty like bellow:
optimizer = torch.optim.Adam(model1.parameters(), lr=0.0001, weight_decay=0.00001)
The ‘weight decay’ didn’t help. Also, adding dropout at the final layer for classification didn’t help (should I add dropout to linear layers?!).
I really appreciate it if you can help me with a possible solution to improve the F1 score of my model. Also I wonder how to interpret the results as the train loss is higher than validation loss but the F1 score of train dataset is better than validation set.