How to handle imbalanced classes

ptrblck · May 30, 2019, 4:58pm

The correspondence between the dataset splits and sample_weights is broken.
While train_labels and val_labels are corresponding to the shuffled indices, both samplers will just assign the weight to the data indices starting at 0 in a sequential order.

The easiest way to fix it, would be to wrap dataset in a Subset before passing them to the DataLoader:

trainloader = DataLoader(Subset(dataset, train_indices), sampler=train_sampler, batch_size=10)
valloader = DataLoader(Subset(dataset, val_indices), sampler=val_sampler, batch_size=10)