Dealing with imbalanced datasets in pytorch

snakers41 · August 8, 2018, 12:38pm

Hi guys, recently I played a lot with:

Weighted Semantic segmentation
Imbalanced data (Google Open Images)

What worked for me:

Loss / mask weighting - showed a lot of improvement. Below is my Loss, and here is the result description

import torch
import torch.nn as nn
import torch.nn.functional as F

class SemsegLossWeighted(nn.Module):
    def __init__(self,
                 use_running_mean=False,
                 bce_weight=1,
                 dice_weight=1,
                 eps=1e-10,
                 gamma=0.9,
                 use_weight_mask=False,
                 deduct_intersection=False
                 ):
        super().__init__()

        self.use_weight_mask = use_weight_mask
        
        self.nll_loss = nn.BCEWithLogitsLoss()
        self.dice_weight = dice_weight
        self.bce_weight = bce_weight
        self.eps = eps
        self.gamma = gamma 
        
        self.use_running_mean = use_running_mean
        self.bce_weight = bce_weight
        self.dice_weight = dice_weight
        self.deduct_intersection = deduct_intersection
        
        if self.use_running_mean == True:
            self.register_buffer('running_bce_loss', torch.zeros(1))
            self.register_buffer('running_dice_loss', torch.zeros(1))
            self.reset_parameters()

    def reset_parameters(self):
        self.running_bce_loss.zero_()        
        self.running_dice_loss.zero_()            

    def forward(self,
                outputs,
                targets,
                weights):
        # inputs and targets are assumed to be BxCxWxH
        assert len(outputs.shape) == len(targets.shape)
        # assert that B, W and H are the same
        assert outputs.size(0) == targets.size(0)
        assert outputs.size(2) == targets.size(2)
        assert outputs.size(3) == targets.size(3)
        
        # weights are assumed to be BxWxH
        # assert that B, W and H are the are the same for target and mask
        assert outputs.size(0) == weights.size(0)
        assert outputs.size(1) == weights.size(1)
        assert outputs.size(2) == weights.size(2)
        assert outputs.size(3) == weights.size(3)
        
        if self.use_weight_mask:
            bce_loss = F.binary_cross_entropy_with_logits(input=outputs,
                                                          target=targets,
                                                          weight=weights)            
        else:
            bce_loss = self.nll_loss(input=outputs,
                                     target=targets)

        dice_target = (targets == 1).float()
        dice_output = F.sigmoid(outputs)
        
        intersection = (dice_output * dice_target).sum()
        if self.deduct_intersection:
            union = dice_output.sum() + dice_target.sum() - intersection + self.eps
        else:
            union = dice_output.sum() + dice_target.sum() + self.eps
            
        dice_loss = (-torch.log(2 * intersection / union))         
        
        if self.use_running_mean == False:
            bmw = self.bce_weight
            dmw = self.dice_weight
            # loss += torch.clamp(1 - torch.log(2 * intersection / union),0,100)  * self.dice_weight
        else:
            self.running_bce_loss = self.running_bce_loss * self.gamma + bce_loss.data * (1 - self.gamma)        
            self.running_dice_loss = self.running_dice_loss * self.gamma + dice_loss.data * (1 - self.gamma)

            bm = float(self.running_bce_loss)
            dm = float(self.running_dice_loss)

            bmw = 1 - bm / (bm + dm)
            dmw = 1 - dm / (bm + dm)
                
        loss = bce_loss * bmw + dice_loss * dmw
        
        return loss,bce_loss,dice_loss

Over / under sampling and / or sampling (link) - worked technically, but no accuracy boost
Analyzing the internal structure of data and building a cascade of models

Hope this is helpful.

Deeply · August 8, 2018, 1:20pm

You are absolutely correct!
In such case, it is better to use BCELoss instead of BCEWithLogitsLoss, hence, we need to apply the sigmoid on the output before multiplying it by the weight.
Or, if using the BCEWithLogitsLoss( reduce = ‘none’), then, multiplying the weight by the loss and taking the mean will doe, as follows:

if cf.use_weight_to_balance_data:
       weight = weight.to(device)
       loss = criterion(output, target)
       loss = torch.mul(weight, torch.transpose(loss.double(), 0, 1) )               
       loss= torch.mean(loss)

As for pos_weight, the documentation says this:
where pn is the positive weight of class n. pn>1 increases the recall, pn<1 increases the precision.

For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to 300100=3. The loss would act as if the dataset contains math: 3times 100=300 positive examples.

Thus, I don’t think it would help in balancing the data, to give an example on using the pos_weight, I imagine building a classifier to classify m diseases, now, each disease has stats on the positive and negative cases that can be used to estimate the pos_weight and use it in the analysis.

Deeply · August 8, 2018, 5:48pm

My latest method above worked well, but there was not any improvement, so in my case maybe the balance has no effect. However, when I try to use:

criterion = nn.BCEWithLogitsLoss()
loss = criterion( output, target, weight=weight )

I had an error saying: got an unexpected value weight
So, I took the weight value to the class constructor (which I had to put inside the batch loop), something like:

criterion = nn.BCEWithLogitsLoss(weight=weight)
loss = criterion( output, target )

which also gave an error saying:
RuntimeError: The size of tensor a (644) must match the size of tensor b (10) at non-singleton dimension 1

Thus, I am not sure if F.binary_cross_entropy_with_logit is different from nn.BCEWithLogitsLoss and that’s why my code is not running?!

snakers41 · August 9, 2018, 5:16am

Thus, I am not sure if F.binary_cross_entropy_with_logit is different from nn.BCEWithLogitsLoss and that’s why my code is not running?!

As far as I know both of these methods are mostly the same, but the difference is in the way weight is parametrized.
As far as I see it, the docs say that the weight will be broadcased, but in my case either of these approaches worked with F.binary_cross_entropy_with_logits (if I remember correctly):

Make your weights be WxH
Or make your weights be BxCxWxH
If you try BxWxH or CxWxH - I guess there will be an error

Skinish · August 9, 2018, 8:48am

Thank you for your feedback. Could you please explain further what kind of loss weighting you did in here? By that I mean, what were the weights that you used? And what is the main difference of a F.binary_cross_entropy_with_logits with a weightargument vs nn.BCEWithLogitsLoss with weight / pos_weight argument?

Skinish · August 9, 2018, 8:51am

For what I see, by applying pos_weight in BCEWithLogitsLoss loss, the total loss is indeed getting higher, which is what was intended, but the results are the same, or even worse actually. Maybe the loss becomes harder to minimize?

ptrblck · August 9, 2018, 10:58am

If the overall loss increases, I would try to lower the learning rate to help the model converge.

Skinish · August 9, 2018, 11:14am

I am using Adam so it should not make much difference

ptrblck · August 9, 2018, 11:19am

Probably, but it’s still worth a try

thierry007 · February 5, 2019, 10:47am

Hi,

I am trying to deal with imbalanced data. Based on what I read in discussions above,
nr_samples_of_label_i / total_number_of_samples

For instance on 250000 samples, one of the imbalanced classes contains 150000 samples:
So
150000 / 250000 = 0.6
One of the underrepresented classes:
20000/250000 = 0.08

So to reduce the impact of the overrepresented imbalanced class, I multiply the loss with 1 - 0.6 = 0.4
To increase the impact of the underrepresented class, 1 - 0.08 = 0.92

Is that an acceptable way of working?

Thanks

n0obcoder · July 1, 2019, 8:24am

Let’s say i have to train an image classifier an a highly unbalanced dataset. Then I would like to penalize the losses belonging to the dominating classes less and vice versa !

Can you pls show with a few lines of code how exactly weights in nn.CrossEntropyLoss is passed ?

Imagine we have a dataset in which we have three classes with the following number of examples:
classA: 900
classB: 90
classC: 10
now how would you define ur loss function and how would you pass the weights argument?

would it be like
loss_fn = nn.CrossEntropyLoss(weight = [900/1000, 90/1000, 10/1000]) ???

Isaac_Kargar · February 16, 2020, 7:37am

What about continuous data for regression tasks? Is there any way to handle an imbalanced dataset in that case?

ptrblck · February 16, 2020, 7:39am

You could still use weights to sample your data, but you would have to define how the imbalance is defined (e.g. do you have different clusters of neighboring numbers?) and then use this definition to create weights.

errezeta · March 4, 2020, 11:19am

Hi, I’m also struggling about how to assign weights to my imbalance data for a regression task.
In my case I’m building a model based on LSTMs to predict a float number that varies between 30.0 to 81.3
For example in the range from 30.6 to 39.6 I have 716056 samples in total, whereas in the range from 70.0 to 81.3 I have only 135010 samples.
I would like to use weights to, counteract for this problem of imbalance data, but I don’t know how should I proceed in this case of a regression task where the target could be a number between 30 and 80.
Thank you very much in advance and, help would be highly appreciated.

ptrblck · March 4, 2020, 1:21pm

Your idea of clustering the regression targets to a few clusters and assigning weights to these seem reasonable.
You could either do it manually (as it seems to be the case now) or use something like k-means.
Once you have the clusters, you could count the samples similar to a classification task and calculate the weights based on the number of samples for each cluster and create a mapping between the cluster and weight.
After creating the weights, you could write a function, which accepts the current output batch with the regression prediction, as well as your cluster centers (k-means dict), and returns a batch of weights, which can then be multiplied to create the final loss.

A_Rza_SH · March 7, 2020, 12:56pm

weight = torch.tensor([900/1000, 90/1000, 10/1000], dtype=torch.float, device=‘cuda:0’)

errezeta · March 9, 2020, 9:12am

Thank you very much for your answer. That makes sense.
I will need some time to implement it, I will get back to you when I have something to share.

sasha · March 4, 2022, 7:53pm

Hi,
I am working on very imbalanced data. I have binary targets, 0 and 1, where the 1 ratio is just ~0.27% of my data.
I want to penalize my loss (BCELoss) so I made my class weights as follows:

from sklearn.utils import class_weight
class_weights = class_weight.compute_class_weight('balanced',np.unique(np.ravel(y_train,order='C')),np.ravel(y_train,order='C'))
class_weights=torch.tensor(class_weights,dtype=torch.float)

I am new in PyTorch. In Keras, I just need to pass class weight in my fit function and it does it for me.

However, in PyTorch, I am not able to figure out how I do it. I used:

torch.nn.functional.binary_cross_entropy(out, labels, class_weights)

and I am receiving the following error:


RuntimeError: output with shape [250, 1] doesn't match the broadcast shape [250, 2]

I have read this post and the others, but could not find any good answer to how to do this on Pytorch.
Am I missing something?
Any help would be really appreciated!!

ptrblck · March 5, 2022, 3:15am

The weight argument is used to weight each sample in the inputs, not the classes.
I think you might want to use the pos_weight argument in nn.BCEWithLogitsLoss instead to counter the imbalance.

sasha · March 8, 2022, 6:17pm

Thanks, @ptrblck!
I have used pos_weight and did not get a good result. How can I use weight arguments? I used it in a way that I explained in my previous question above. My last layer is a linear layer with a sigmoid.
self.classifier = nn.Sequential( nn.Linear(32, 12), nn.ReLU(True), nn.Dropout(0.3), nn.Linear(12, 1), nn.Sigmoid())
and I have received the error. Why do I receive this error?

Also, would you explain more what does that mean?