I see okay. One last question: does having an imbalanced problem affect the gradients that are learnt? I’m using CrossEntropyLoss()
, with like only 10-20 labels per classification, the other samples are -100
, the ignore index.
It will change the shape of the loss function for sure. But I don’t think it should be a very large issue.
I see okay. Because my samples are heavily skewed, with like only 10% having labels. Most of my other labels are the ignore indexes. I’ll have a look to see if the net can learn better with other functions. Thanks a lot for explaining!