I have a classification problem related to facial recognition, where real world data is unbalanced and I am interested in minority class. I am employing transfer learning for this. Goal is to Maximize F1-score wrt to minority class.

Initially I got a data set which was imbalanced-initially and applied class weights as below. The F1 score I obtained was 0.65 on a difficult test set (curated to be difficult, but performance on real world data is good). Though Recall was a bit less than what I wanted, so I needed to explore more.

```
class_weights = [1/no_samples_minority, 1/no_samples_majority]
class_weights = torch.FloatTensor(class_weights).to(device)
criterion = nn.CrossEntropyLoss(weight = class_weights)
```

I combined above data with more data so that the samples are balanced for both positive and negative classes. I doubled the training data in that aspect and was expecting better results. Data quality of new data is also good. Now using balanced weights I get F1 score as 0.3 on same dataset. Basically it classifies almost everything in minority class(Too many false positive). The goal is to maximize both Precision and recall but there is a slight bias towards Precision.

```
class_weights = [0.5, 0.5]
class_weights = torch.FloatTensor(class_weights).to(device)
criterion = nn.CrossEntropyLoss(weight = class_weights)
```

Thanks in advance for any help. What could be better experiments? Should I play around with weights even for balanced data? If yes, what should be the methodology? Any other loss function?

Note: Data has many classes and I have converted it into one vs rest binary classification problem.

EDIT: Number of data sample is 80k before adding more data (14/86) and 150k(50/50) after adding data to balance.