Using WeightedRandomSampler for an imbalanced classes

Hello,
I have an imbalanced dataset in 6 classes, and I’m using the “WeightedRandomSampler”, but when I load the dataset, the train doesn’t work. My code is here:

train_transforms = transforms.Compose([
transforms.Resize((sz, sz)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

valid_transforms = transforms.Compose([
transforms.Resize((sz, sz)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

train_ds = datasets.ImageFolder(train path, train_transforms)
valid_ds = datasets.ImageFolder(valid path, valid_transforms)

sample_count = [224, 477, 5027, 4497, 483, 247]
weight = 1 / torch.Tensor(sample_count)
sampler = WeightedRandomSampler(weight, batch_size) 

train_dl = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, sampler=sampler)
valid_dl = torch.utils.data.DataLoader(valid_ds, batch_size=batch_size, shuffle=True)

train_ds_sz = len(train_ds)
valid_ds_sz = len(valid_ds)

print('Train size: {}\nValid size: {} ({:.2f})'.format(train_ds_sz, valid_ds_sz, valid_ds_sz/(train_ds_sz + 
valid_ds_sz)))

class_names = train_ds.classes

Any help would be appreciated.

I found that something is wrong in target because it’s zero but I don’t know why?!

inputs, targets = next(iter(train_dl))
print(targets)

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

What do you mean? Is there a syntax error? If yes, post the trace.
Is it a problem of accuracy?

@charan_Vjy
No, when I run it, nothing happens. My model train is here:

num_epochs = 10
losses = []
for epoch in range(num_epochs):
for i, (inputs, targets) in enumerate(train_dl):
    inputs = to_var(inputs)
    targets = to_var(targets)
    
    #forward pass
    optimizer.zero_grad()
    outputs = model(inputs)
    
    #loss
    loss = criterion(outputs, targets)
    losses += [loss.data]
    
    #backward pass
    loss.backward()
    
    #update parameters
    optimizer.step()
    
    #report
    if (i+1) % 50 == 0:
        print('Epoch [%2d/%2d], Step [%3d, %3d], Loss: %.4f'
             % (epoch + 1, num_epochs, i + 1, len(train_ds) // batch_size, loss.data))   

As I told above, I found that something is wrong in the target.

Print out something every step rather than every first 50 steps. Print out the losses. We need to first figure out what’s happening. As for the target, why is having targets as ‘0’ a problem? Was there supposed to be someother value?

I found an example to create a sample here and modified it to create a sampler for my data as below:

cls0 = np.zeros(224, dtype=np.int32)
cls1 = np.ones(477, dtype=np.int32)
cls2 = np.full(5027, 2, dtype=np.int32)
cls3 = np.full(4497, 3, dtype=np.int32)
cls4 = np.full(483, 4, dtype=np.int32)
cls5 = np.full(247, 5, dtype=np.int32)

target = np.hstack((cls0, cls1))
target = np.hstack((target, cls2))
target = np.hstack((target, cls3))
target = np.hstack((target, cls4))
target = np.hstack((target, cls5))

class_sample_count = np.unique(target, 
return_counts=True)[1]

weight = 1. / class_sample_count
samples_weight = weight[target]

samples_weight = torch.from_numpy(samples_weight)
sampler = WeightedRandomSampler(samples_weight, 
len(samples_weight))

I’m not sure that is correct, but with this sampler, the targets get value.

inputs, targets = next(iter(train_dl)) # Get a batch of training data
print(targets)

tensor([1, 5, 3, 4, 3, 0, 5, 2, 0, 0, 4, 1, 5, 0, 5, 5, 5, 5, 2, 5, 1, 1, 0, 3])

and the train runs, but the number of loaded data is the same as the total number of data.

total number of data = 10955
batch_size = 24
step = 10955/24 = 456

Epoch [ 1/ 2], Step [ 50, 456], Loss: 1.5504
Epoch [ 1/ 2], Step [100, 456], Loss: 1.6046
Epoch [ 1/ 2], Step [150, 456], Loss: 1.6864
Epoch [ 1/ 2], Step [200, 456], Loss: 1.6291
Epoch [ 1/ 2], Step [250, 456], Loss: 1.4469
Epoch [ 1/ 2], Step [300, 456], Loss: 1.7395
Epoch [ 1/ 2], Step [350, 456], Loss: 1.6110
Epoch [ 1/ 2], Step [400, 456], Loss: 1.4821
Epoch [ 1/ 2], Step [450, 456], Loss: 1.7239
Epoch [ 2/ 2], Step [ 50, 456], Loss: 1.3867
Epoch [ 2/ 2], Step [100, 456], Loss: 1.6165
Epoch [ 2/ 2], Step [150, 456], Loss: 1.6229
Epoch [ 2/ 2], Step [200, 456], Loss: 1.4635
Epoch [ 2/ 2], Step [250, 456], Loss: 1.5007
Epoch [ 2/ 2], Step [300, 456], Loss: 1.6607
Epoch [ 2/ 2], Step [350, 456], Loss: 1.6613
Epoch [ 2/ 2], Step [400, 456], Loss: 1.5939
Epoch [ 2/ 2], Step [450, 456], Loss: 1.4794

Note that the input to the WeightedRandomSampler in pytorch’s example is weight[target] and not weight. The length of weight_target is target whereas the length of weight is equal to the number of classes. This is probably the reason for the difference. Try using WeightedRandomSampler(..,...,..,replacement=False) to prevent it from happening.

As far as the loss for each steps go, it looks good. See if you could aggregate together all the losses and check if the loss for every subsequent epoch is decreasing.
However, having a batch with the same class is definitely an issue.

@charan_Vjy
Thanks for your help. I didn’t understand what exactly I need to do.
Should the number of data in the “WeightedRandomSampler” be the total number of data or batch_size or the length of the smallest class?
And also, Are my target values wrong in this way?
I made a change like below and got the error when I want to make the targets. If you could show me by code, that would be great.

sampler = WeightedRandomSampler([224,477,5027,4497,483,247], len(samples_weight), replacement=False)

RuntimeError: cannot sample n_sample > prob_dist.size(-1) samples without replacement

The weights should correspond to each sample in the train set. If their are 10,000 samples in the train set, the weights should correspond to each of the 10,000 samples. You would want to do something like this:

inputs,targets = train_ds ##Modify this to get all targets 
weight = 1. / class_sample_count
samples_weight = weight [ targets ]  ### You should get the labels in 

When I try to get targets from the train_ds, it receives zero. I think I got all the targets correctly in a previous way, and the only thing that I haven’t understood is the target of a batch of data, which is still imbalanced. For example, I changed the batch_size to 6, which is the number of my classes and passed it as the number of data into WeightedRandomSampler and after loading a batch of data I expected to have a target with one sample of each class but I got different:

weight = 1. / class_sample_count
samples_weight = weight[target]

sampler = WeightedRandomSampler(samples_weight, batch_size, replacement=True)

inputs, targets = next(iter(train_dl))     # Get a batch of training data
print(targets)

tensor([5, 3, 1, 4, 5, 5])

Below are examples from Pytorch’s forums which address your question.

list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True))
Output: [0, 0, 0, 1, 0]
list(WeightedRandomSampler([0.9, 0.4, 0.05, 0.2, 0.3, 0.1], 5, replacement=False))
Output: [0, 1, 4, 3, 2]

For a batch size < no_of classes, using Replacement = False would generate independent samples. If batch size > no_of classes, it would throw this error, RuntimeError: cannot sample n_sample > prob_dist.size(-1) samples without replacement.

I’m so confused. Here is what I did and its result:

batch_size = 6
weight = 1. / class_sample_count
samples_weight = weight[target]
sampler = WeightedRandomSampler(samples_weight, batch_size, replacement=False)
print('samlper= ', list(sampler))

samlper= [8857, 190, 210, 8028, 10662, 1685]

train_dl = torch.utils.data.DataLoader(train_ds, batch_size=batch_size, sampler=sampler, num_workers = 16)
valid_dl = torch.utils.data.DataLoader(valid_ds, batch_size=batch_size, shuffle=True, num_workers = 16)

inputs, targets = next(iter(train_dl))     # Get a batch of training data
print('targets= ', targets)

targets= tensor([4, 5, 5, 0, 3, 3])

Loss after 15 Epochs:

Epoch [ 1/15], Step [  1, 1825], Loss: 1.5785
Epoch [ 2/15], Step [  1, 1825], Loss: 1.9562
Epoch [ 3/15], Step [  1, 1825], Loss: 1.8681
Epoch [ 4/15], Step [  1, 1825], Loss: 2.0667
Epoch [ 5/15], Step [  1, 1825], Loss: 1.9168
Epoch [ 6/15], Step [  1, 1825], Loss: 1.8286
Epoch [ 7/15], Step [  1, 1825], Loss: 1.9063
Epoch [ 8/15], Step [  1, 1825], Loss: 1.8187
Epoch [ 9/15], Step [  1, 1825], Loss: 1.6252
Epoch [10/15], Step [  1, 1825], Loss: 2.3157
Epoch [11/15], Step [  1, 1825], Loss: 1.7716
Epoch [12/15], Step [  1, 1825], Loss: 2.1706
Epoch [13/15], Step [  1, 1825], Loss: 1.7085
Epoch [14/15], Step [  1, 1825], Loss: 1.7478
Epoch [15/15], Step [  1, 1825], Loss: 2.0860
evaluate_model(model, valid_dl)

accuracy: 19.58

evaluate_model(model, train_dl)

accuracy: 0.00

It seems that something goes wrong!

This is interesting. The values in the batches are not unique in spite of using replacement = False.
As far as the loss is concerned, This could be down to a couple of problems. Try the following out

  • Try out different learning rates (smaller than the one you are currently using).
  • Remove all regularization and momentum until the loss starts decreasing.
  • Check the inputs right before it goes into the model (detach and plot it). Check correspondance with labels.
  • You may also be updating the gradients way too many times as a consequence of a small batch size. As the targets are still not unique, you may as well keep a larger batch.