Hi,
I have an multi-label classification problem. There are 4 targets (500 observations each, for the first three labels and 50 observations for the fourth label). The loss function is cross entropy.
I use pretrained model (ResNet18) in Pytorch.
Case 1:
When I use train_test_split from sklearn (with stratify) and use it as usual (creating instance of Dataset class and then feeding to the DataLoader), my validation loss seems to reduce over epochs.
Case 2:
When I do train_test_split (with stratify) and use WeightedRandomSampler (to account for the imbalance), my validation loss reduces slowly (requiring more epochs) but never to the level seen in “Case 1”. There is lot of fluctuation in the loss too.
I saw the code for WeightedRandomSampler in GitHub. Using the WeightedRandomSampler, my dataset now contains more observations of the minor class, which is what I want. My questions are:
- Is “Case 2”, the expected behavior, when using the WeightedRandomSampler?
- Any thoughts how I can reduce validation loss, when using the WeightedRandomSampler?