Loss decreases slowly with WeightedRandomSampler

KarthikR · April 12, 2020, 4:32pm

Hi,

I have an multi-label classification problem. There are 4 targets (500 observations each, for the first three labels and 50 observations for the fourth label). The loss function is cross entropy.

I use pretrained model (ResNet18) in Pytorch.

Case 1:
When I use train_test_split from sklearn (with stratify) and use it as usual (creating instance of Dataset class and then feeding to the DataLoader), my validation loss seems to reduce over epochs.

Case 2:
When I do train_test_split (with stratify) and use WeightedRandomSampler (to account for the imbalance), my validation loss reduces slowly (requiring more epochs) but never to the level seen in “Case 1”. There is lot of fluctuation in the loss too.

I saw the code for WeightedRandomSampler in GitHub. Using the WeightedRandomSampler, my dataset now contains more observations of the minor class, which is what I want. My questions are:

Is “Case 2”, the expected behavior, when using the WeightedRandomSampler?
Any thoughts how I can reduce validation loss, when using the WeightedRandomSampler?

ptrblck · April 13, 2020, 2:58am

In your first case, the loss might decrease fast, if your model simply overfits the majority class(es), thus ignoring the minority class.
You could calculate the confusion matrix for both use cases and calculate the per-class accuracies to rate both approaches.
Note that a simple accuracy calculation might be misleading for an imbalanced use case as explained by the Accuracy Paradox.

KarthikR · April 13, 2020, 3:24am

Thank you, will explore.