Unbalanced Data, Overfitting occurs

skyunyoo · June 6, 2020, 1:19pm

Thank you for the reply!

I am applying pos_weight by counting the number of negative and positive numbers as shown below.

As a difference between before and after applying pos_weight, I was confirmed whether training loss (+dice) was reduced.

        data_loader = torch.utils.data.DataLoader(traindataset, batch_size = batch_size, shuffle=True, num_workers=0, pin_memory=False)
        
        pos = 0.0
        neg = 0.0
        for b, (x,y) in enumerate(data_loader):
            for p in range(len(y)):
                if y[p][0].max()==1:
                    pos+=1
                else:
                    neg+=1
        weights = neg/pos
        pos_weight = torch.tensor(weights)



        for batch_idx , (data,target) in enumerate(data_loader):
            inputs,target = data.to(device),target.to(device) 
         ...
         criterion = nn.BCEWithLogitsLoss(reduction='mean', pos_weight=pos_weight)

Should I set the value of pos_weight randomly without calculating like this?

I also read about the sampler from the link.
About WeightedRandomSampler

The confusing part of this sampler is when there is llabel(value have 1) and no label(value only 0) (i.e. binary problem),
class_sample_count = [] What value should I put in this part?

class_sample_count = [positive_count] ? or
class_sample_count = [negative_count,positive_count]?