I have a multi-class classification problem (5 classes), and the data is imbalanced. So I was thinking if we can let the model learn the best weights by making the “weights” option in CrossEntropyLoss learnable parameters.
Is this a valid assumption or not?
First, you would not want to. Consider what would happen if you
tried. Your optimizer would drive your loss to zero simply by driving
all of the learnable class weights to zero (with
reduction = 'sum')
or drive only one class to have a non-zero weight, and drive your
model to predict only that class (with
reduction = 'mean').
Second, as written, pytorch’s
CrossEntropyLoss doesn’t support
calculating gradients with respect to the class weights. (You could,
of course, write your own version of
CrossEntropyLoss that did,
but, in line with my first comment, you wouldn’t want to.)