Suppose I have a training set which consists of 4 classes and the number of samples belonging to the 4 classes is 20, 30, 40, 10 respectively. So should I pass the tensor torch.tensor([20,30,40,10]) / 100. to the weight argument of the loss function?

Or should I calculate the values of the weight argument for each batch on the fly in the training loop?

Hi Tejan!

You have this backwards – you want to weight the less-frequent

classes more heavily in your loss function. The most common

weighting scheme would be the reciprocal of what you have,

`100.0 / torch.tensor ([20.0 ,30.0 ,40.0 ,10.0])`

My preference is to calculate the weights using the frequency of

classes in the entire training set and use this single set of weights

for each batch.

Best.

K. Frank

Hi,

But shouldn’t the sum of the weight vector equal to 1?

Hi Tejan!

No. This kind of use case applies to CrossEntropyLoss, which computes

a *weighted mean* (when using the default `reduction = 'mean'`

). This

means that `CrossEntropyLoss`

divides by the sum of the weights, so

the sum of the weights drops out of the final loss value.

(It doesn’t hurt to have the weights sum to one; it just doesn’t matter.)

Best.

K. Frank