Dealing with imbalanced datasets in pytorch

I am trying to find a way to deal with imbalanced data in pytorch. I was used to Keras’ class_weight, although I am not sure what it really did (I think it was a matter of penalizing more or less certain classes).

The only solution that I find in pytorch is by using WeightedRandomSamplerwith DataLoader, that is simply a way to take more or less the same number of samples per each class (and maybe duplicate the samples of some classes if needed?). However, I am looking for another another alternative like the one provided in Keras that does not involve repeating some samples. Is it possible to do it in pytorch? Thanks in advance.


You can also apply class weighting using the weight argument for a lot of loss functions.
nn.NLLLoss or nn.CrossEntropyLoss both include this argument.
You can find all loss functions here.


You are right, did not realize that! :slight_smile:

Let me be a bit more specific then. I am building a hierarchical classifier that combines 3 losses (3 levels of classification). It is a multi-label problem and I am using BCELoss at the 3 different levels, then summing all and backpropagating the errors. In this case, should I instantiate 3 different BCELoss and pass them a list containing the weights for the classes corresponding to the hierarchy level?

Could you explain a bit more about how you use the BCELoss at 3 different levels?
Do you assign something like this:

0.0 - class0
0.5 - class1
1.0 - class2

If so, I would recommend to weight the different predictions:

batch_size = 5
nb_classes = 3
output = torch.randn(batch_size, nb_classes)
target = torch.empty(batch_size, nb_classes).random_(2)
weight = torch.tensor([1.0, 2.0, 1.0])

criterion = nn.BCEWithLogitsLoss(reduction='none')
loss = criterion(output, target)
loss = loss * weight
loss = loss.mean()

Would that work for you?


Thank you for helping me out :smile:

This is how I update my weights, each target and y_pred are binary vectors.

 # Compute loss and update parameters for all levels
loss1 = criterion(y_pred1.cuda(), target1)
loss2 = criterion(y_pred2.cuda(), target2)
loss3 = criterion(y_pred3.cuda(), target3)
loss = sum([loss1, loss2, loss3])  # combine all losses

My loss is criterion = nn.BCELoss(). My idea was to initialize 3 different loss functions instead (criterion1, criterion2 and criterion3) so that I could pass the weight vector right away. Something like

criterion1 = nn.BCELoss(weights1)
criterion2 = nn.BCELoss(weights2)
criterion3 = nn.BCELoss(weights3)

Should also work fine.
One side note: why do you call .cuda() on the predictions?
They should already be on the GPU if your input and model are pushed to a CUDA device.

1 Like

You are right, it makes no difference to call .cuda().

It seems like with weights the loss function needs to be on GPU, is that true?

Yes, weight should be on the same device as your output and target.

1 Like

Thank you. One last question: what should be the rule of thumb for assigning weights to the labels? I am assigning according to the following rule: weight_label_l = max(nr_samples_per_label) / nr_samples_label_l

1 Like

Sounds reasonable. You could probably also try sum(nr_samples_per_label).
I’m not sure if there is a general rule of thumb as you might want to balance your per-class accuracies manually.


Shouldn’t the weights be like:

weight_label_i =  nr_samples_of_label_i  / total_number_of_samples

i = 1, 2, ...


This would weight the majority classes higher, while we would like to weight the loss of minority classes higher or am I mistaken?

1 Like

You are right. The BCELoss is given by:
ℓ(x,y)=L={l1,…, ln,....,lN}⊤, ln = −wn[yn⋅logxn+(1−yn)⋅log(1−xn)]

in this case we invert the above:
weight_label_i = total_number_of_samples / nr_samples_of_label_i

The documentation also says:

weight (Tensor, optional) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size “nbatch”

So, the weight vector works on the batch level and not the per-class samples level, or, I am missing something here?

1 Like

Yeah, you are right! Thanks for pointing this out. I tried not to mix up both new threads about weighting and thought we are dealing with some classification loss like nn.NLLLoss.

@Skinish have a look at this thread to see how to apply pos_weight instead.

1 Like

I am not sure of the pos_weight will do, even BCELoss has this pos_weight input argument, and this does not seem to fulfill what is needed (I could be mistaken). The way I see it to resolve this issue in a simple manner is to dot product the weight vector by the output and target/label-vector, during finding the loss of the training, as follows:

criterion = nn.BCEWithLogitsLoss()
loss = criterion(W*output.float(), W*target.float())

For more flexibility and ease, the (balance) weight vector could be generated within the dataset class.

I was trying to implement this weight multiplication, not sure if this is the best way to do it as I had to use torch.transpose twice, here it goes:

# 10 is the batch size, so each sample has a weight value

torch.Size([10, 644])
tensor([ 0.9987,  0.9997,  0.9997,  0.9992,  0.9997,  0.9985,  0.9905,
         0.9911,  0.9476,  0.9944], dtype=torch.float64, device='cuda:0')

ipdb> output = torch.mul(weight, torch.transpose(output, 0, 1) )
ipdb> output = torch.transpose(output, 0,1)
torch.Size([644, 10])

NB. I have added the weights to the Dataset class

The whole thing will look like this:

if cf.use_weight_to_balance_data:
       weight =
       output = torch.mul(weight, torch.transpose(output.double(), 0, 1) )
       output = torch.transpose(output, 0, 1)
       target = torch.mul(weight, torch.transpose(target.double(), 0, 1) )
       target = torch.transpose(target, 0, 1)

@Deeply @ptrblck thank you for all your help. Should the weight multiplication be the viable solution? I did not quite understand what would be wrong with passing the weights vector to the loss function (to the pos_weight argument), although the weight assignment that I said should be changed.

What would the effect of passing a vector to weight then?

@Deeply I’m not sure it’s a good idea to multiply the output and target directly with the weight.
Both will pass a criterion and maybe a sigmoid, so I would rather multiply the loss.

@Skinish here is a github issue discussing the introduction of pos_weight and a comparison between weight and pos_weight.
As you have different criteria for your targets and predictions, I think using pos_weight would work.
Let me know, if that works for you.