Suppose I have a training set which consists of 4 classes and the number of samples belonging to the 4 classes is 20, 30, 40, 10 respectively. So should I pass the tensor torch.tensor([20,30,40,10]) / 100. to the weight argument of the loss function?
Or should I calculate the values of the weight argument for each batch on the fly in the training loop?
Hi Tejan!
You have this backwards – you want to weight the less-frequent
classes more heavily in your loss function. The most common
weighting scheme would be the reciprocal of what you have,
100.0 / torch.tensor ([20.0 ,30.0 ,40.0 ,10.0])
My preference is to calculate the weights using the frequency of
classes in the entire training set and use this single set of weights
for each batch.
Best.
K. Frank
Hi,
But shouldn’t the sum of the weight vector equal to 1?
Hi Tejan!
No. This kind of use case applies to CrossEntropyLoss, which computes
a weighted mean (when using the default reduction = 'mean'
). This
means that CrossEntropyLoss
divides by the sum of the weights, so
the sum of the weights drops out of the final loss value.
(It doesn’t hurt to have the weights sum to one; it just doesn’t matter.)
Best.
K. Frank