Hello all, I have read some related question about imbalance classification; however, I did not find the answer. My dataset is imbalance class that shows the distribution in below
I am using WeightedRandomSampler
to handle the above problem. First, the train_labels
is label of training set it likes train_labels=[0, 2, 1, 2, 4, 2, 4, 3, 5...]
class_sample_counts=np.unique(train_labels, return_counts=True)[1]
weights = (1 / torch.Tensor(class_prob))
weighted_sampler = torch.utils.data.sampler.WeightedRandomSampler(weights, len(train_labels))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, sampler=weighted_sampler)
The code works but the top-1 accuracy of training and validation goes 100% with Imagenet pretrain very fast, while the the testing accuracy very bad (30%). If I did not use sampler
way, the accuracy of testing set is 70% and the accuracy of training and validation slowly grow-up. What is happen in my code? Thanks