CrossEntropyLoss weight source

For some reason, unknown to me now, at some point I switched the source of the CrossEntropyLoss weight vector.

  • originally: the 1/(class samples) for the entire training set
  • currently: the 1/(class samples) for the current minibatch

The current method has been working fine. And I can’t find any post online mentioning using the mini batch distribution. My back-justification thinking is that since the mini batches are randomly sampled, sometimes there are significant skews in the class distributions (this is objectively true for my dataset), and therefore I should “help” the criterion.

However, I wonder if it’s just overthinking the issue, or worse, leaving performance on the table.

Hi apytorch!

I think that you are just overthinking it.

Having said that, my intuition on the matter – without any evidence
to back it up – is:

  1. Class weights in the loss function are rough values to help
    things along a little bit. Your training process and results
    should be robust with respect to the exact values of the class
    weights you use.

  2. My feeling is that estimating your class weights from the entire
    training set, and then using the same weights for all batches,
    should be sufficient, and probably even preferable. It is true that
    (unless you construct your batches so the the number of samples
    from each class is the same from batch to batch) your class
    distributions will vary from batch to batch. But because your
    learning rate is “small,” your training process will, in effect,
    average over several batches, smoothing out the varying class
    distributions. This will be even more true if you use momentum.


K. Frank

1 Like