I apologize if this is not pytorch-specific enough.
I have a decently imbalanced data set for a time series classification problem (several classes, with the smallest class being ~1/8 as common as the most, and the most is about 50% of the total). Unfortunately, due to continuous nature of the data, and my desire to use the entire time series of each sample as the input (it is expected that contextual information is very important in classification), I cannot over or under sample at all.
Therefore I set the weights to be the inverse proportion of the class prevalence in the training set (i.e. largest weight 8, smallest weight 1). I will say that currently I am still adding capacity, because I can’t seem to really overfit the data yet. (although huge models just underfit).
I was noticing that the most common class was getting a lower accuracy than the other classes. So, in a lets-try-it-and-see test, I created a separate model to just classify that one class vs other, which achieved ~80% accuracy. I then fed this prediction into the first model as a single feature for each chunk of time that has to be classified. So, obviously the model should have a lot more information about how to classify that particular class. However, the loss is stopping at pretty much exactly the same value, with basically the same confusion matrix.
I’m flummoxed as to where to go from here. The class weights themselves seem to be limiting me, but when I’ve tried unweighted, there is too much of a hit on all of the other classes except the most common. I’ve thought about building separate models for each class, but even if each individual model does well, there still has to be a “decider” model that takes their output, which will again have to deal with the imbalance.