Dealing with imbalanced datasets in pytorch

Skinish · August 7, 2018, 1:37pm

I am trying to find a way to deal with imbalanced data in pytorch. I was used to Keras’ class_weight, although I am not sure what it really did (I think it was a matter of penalizing more or less certain classes).

The only solution that I find in pytorch is by using WeightedRandomSamplerwith DataLoader, that is simply a way to take more or less the same number of samples per each class (and maybe duplicate the samples of some classes if needed?). However, I am looking for another another alternative like the one provided in Keras that does not involve repeating some samples. Is it possible to do it in pytorch? Thanks in advance.

ptrblck · August 7, 2018, 1:46pm

You can also apply class weighting using the weight argument for a lot of loss functions.
nn.NLLLoss or nn.CrossEntropyLoss both include this argument.
You can find all loss functions here.

Skinish · August 7, 2018, 1:59pm

You are right, did not realize that!

Let me be a bit more specific then. I am building a hierarchical classifier that combines 3 losses (3 levels of classification). It is a multi-label problem and I am using BCELoss at the 3 different levels, then summing all and backpropagating the errors. In this case, should I instantiate 3 different BCELoss and pass them a list containing the weights for the classes corresponding to the hierarchy level?

ptrblck · August 7, 2018, 2:12pm

Could you explain a bit more about how you use the BCELoss at 3 different levels?
Do you assign something like this:

0.0 - class0
0.5 - class1
1.0 - class2

?
If so, I would recommend to weight the different predictions:

batch_size = 5
nb_classes = 3
output = torch.randn(batch_size, nb_classes)
target = torch.empty(batch_size, nb_classes).random_(2)
weight = torch.tensor([1.0, 2.0, 1.0])

criterion = nn.BCEWithLogitsLoss(reduction='none')
loss = criterion(output, target)
loss = loss * weight
loss = loss.mean()

Would that work for you?

Skinish · August 7, 2018, 2:19pm

Thank you for helping me out

This is how I update my weights, each target and y_pred are binary vectors.

 # Compute loss and update parameters for all levels
loss1 = criterion(y_pred1.cuda(), target1)
loss2 = criterion(y_pred2.cuda(), target2)
loss3 = criterion(y_pred3.cuda(), target3)
loss = sum([loss1, loss2, loss3])  # combine all losses
loss.backward()

My loss is criterion = nn.BCELoss(). My idea was to initialize 3 different loss functions instead (criterion1, criterion2 and criterion3) so that I could pass the weight vector right away. Something like

criterion1 = nn.BCELoss(weights1)
criterion2 = nn.BCELoss(weights2)
criterion3 = nn.BCELoss(weights3)

ptrblck · August 7, 2018, 2:27pm

Should also work fine.
One side note: why do you call .cuda() on the predictions?
They should already be on the GPU if your input and model are pushed to a CUDA device.

Skinish · August 7, 2018, 3:07pm

You are right, it makes no difference to call .cuda().

Skinish · August 7, 2018, 4:19pm

It seems like with weights the loss function needs to be on GPU, is that true?

ptrblck · August 7, 2018, 4:40pm

Yes, weight should be on the same device as your output and target.

Skinish · August 7, 2018, 4:47pm

Thank you. One last question: what should be the rule of thumb for assigning weights to the labels? I am assigning according to the following rule: weight_label_l = max(nr_samples_per_label) / nr_samples_label_l

ptrblck · August 7, 2018, 4:54pm

Sounds reasonable. You could probably also try sum(nr_samples_per_label).
I’m not sure if there is a general rule of thumb as you might want to balance your per-class accuracies manually.

Deeply · August 7, 2018, 5:34pm

Shouldn’t the weights be like:

weight_label_i =  nr_samples_of_label_i  / total_number_of_samples

i = 1, 2, ...

??

ptrblck · August 7, 2018, 5:37pm

This would weight the majority classes higher, while we would like to weight the loss of minority classes higher or am I mistaken?

Deeply · August 7, 2018, 5:49pm

You are right. The BCELoss is given by:
ℓ(x,y)=L={l1,…, ln,....,lN}⊤, ln = −wn[yn⋅logxn+(1−yn)⋅log(1−xn)]

in this case we invert the above:
weight_label_i = total_number_of_samples / nr_samples_of_label_i

Deeply · August 7, 2018, 5:57pm

The documentation also says:

weight (Tensor, optional) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size “nbatch”

So, the weight vector works on the batch level and not the per-class samples level, or, I am missing something here?

ptrblck · August 7, 2018, 6:01pm

Yeah, you are right! Thanks for pointing this out. I tried not to mix up both new threads about weighting and thought we are dealing with some classification loss like nn.NLLLoss.

@Skinish have a look at this thread to see how to apply pos_weight instead.

Deeply · August 7, 2018, 9:27pm

I am not sure of the pos_weight will do, even BCELoss has this pos_weight input argument, and this does not seem to fulfill what is needed (I could be mistaken). The way I see it to resolve this issue in a simple manner is to dot product the weight vector by the output and target/label-vector, during finding the loss of the training, as follows:

criterion = nn.BCEWithLogitsLoss()
...
...
loss = criterion(W*output.float(), W*target.float())

For more flexibility and ease, the (balance) weight vector could be generated within the dataset class.

Deeply · August 8, 2018, 12:20am

I was trying to implement this weight multiplication, not sure if this is the best way to do it as I had to use torch.transpose twice, here it goes:

# 10 is the batch size, so each sample has a weight value

ipdb>weight.shape
torch.Size([10])  
ipdb>output.shape
torch.Size([10, 644])
ipdb>weight
tensor([ 0.9987,  0.9997,  0.9997,  0.9992,  0.9997,  0.9985,  0.9905,
         0.9911,  0.9476,  0.9944], dtype=torch.float64, device='cuda:0')

ipdb> output = torch.mul(weight, torch.transpose(output, 0, 1) )
ipdb> output = torch.transpose(output, 0,1)
ipdb>output.shape
torch.Size([644, 10])

NB. I have added the weights to the Dataset class

The whole thing will look like this:

if cf.use_weight_to_balance_data:
       weight = weight.to(device)
       output = torch.mul(weight, torch.transpose(output.double(), 0, 1) )
       output = torch.transpose(output, 0, 1)
       target = torch.mul(weight, torch.transpose(target.double(), 0, 1) )
       target = torch.transpose(target, 0, 1)

Skinish · August 8, 2018, 9:37am

@Deeply @ptrblck thank you for all your help. Should the weight multiplication be the viable solution? I did not quite understand what would be wrong with passing the weights vector to the loss function (to the pos_weight argument), although the weight assignment that I said should be changed.

What would the effect of passing a vector to weight then?

ptrblck · August 8, 2018, 11:49am

@Deeply I’m not sure it’s a good idea to multiply the output and target directly with the weight.
Both will pass a criterion and maybe a sigmoid, so I would rather multiply the loss.

@Skinish here is a github issue discussing the introduction of pos_weight and a comparison between weight and pos_weight.
As you have different criteria for your targets and predictions, I think using pos_weight would work.
Let me know, if that works for you.