How to calculate class weights for token level classification problem?

TARUN_BHATIA · February 16, 2022, 3:16pm

For each of my sentence the 0 labels are very less as compared to the 1’s (for token level classification). I use batches and calculate loss(CrossEntropy) after each batch. How should i create the class weights vector and use it in the loss calculation. Please suggest !

ptrblck · February 17, 2022, 2:20am

You could initialize the weights as e.g. the class frequency which would then add more weight to the rare classes.

TARUN_BHATIA · February 17, 2022, 10:10am

Thanks @ptrblck , so it would be like class_freq_zeros = 1 / <no of 0 tokens in whole batch of 8 sentences>, and similarly for 1’s ?

ptrblck · February 17, 2022, 6:16pm

Yes, a per-batch weighting would work, but you should also consider checking the class distribution of the entire dataset and set the weights once before training to see which approach would be better.