Dealing with imbalanced datasets in pytorch

ptrblck · March 8, 2022, 7:42pm

The weight argument is not used as a “class weight”, since nn.BCE(WithLogits)Loss allows for floating point targets. While you can interpret a 0 and 1 target as class0 and class1, respectively, you could also use e.g. 0.9 as the target value.
This is why you are specifying the weight for each sample in the batch to weight the loss of this particular sample.

sasha · March 9, 2022, 8:38pm

Thanks so much, @ptrblck for your support!!

Sorry for not being clear in asking my question. I meant why the error says:
RuntimeError: output with shape [250, 1] doesn't match the broadcast shape [250, 2]?
As you explained the weight is being used by loss to penalize the outcome. The class weight that I put in my loss is [1,365]. As I use binary cross-entropy loss, I thought that I have to use a sigmoid and one nod for the last layer.
However, if I understood correctly, the error is saying that it expects to have 2 numbers (250 is my batch size). Should I change it to two nodes and use softmax?

Also, do you recommend using WeightedRandomSampler to have a balanced class in each batch during the training time?

ptrblck · March 10, 2022, 1:30am

That’s the issue, as the weight argument is not a class weight, but a sample weight.

Yes, I think balancing the samples in each batch is a good approach.