For some reason, unknown to me now, at some point I switched the source of the CrossEntropyLoss weight vector.
originally: the 1/(class samples) for the entire training set
currently: the 1/(class samples) for the current minibatch
The current method has been working fine. And I can’t find any post online mentioning using the mini batch distribution. My back-justification thinking is that since the mini batches are randomly sampled, sometimes there are significant skews in the class distributions (this is objectively true for my dataset), and therefore I should “help” the criterion.
However, I wonder if it’s just overthinking the issue, or worse, leaving performance on the table.
Having said that, my intuition on the matter – without any evidence
to back it up – is:
Class weights in the loss function are rough values to help
things along a little bit. Your training process and results
should be robust with respect to the exact values of the class
weights you use.
My feeling is that estimating your class weights from the entire
training set, and then using the same weights for all batches,
should be sufficient, and probably even preferable. It is true that
(unless you construct your batches so the the number of samples
from each class is the same from batch to batch) your class
distributions will vary from batch to batch. But because your
learning rate is “small,” your training process will, in effect,
average over several batches, smoothing out the varying class
distributions. This will be even more true if you use momentum.