RNN's and imbalanced data

Standard accuracy scores mean very little with a highly unbalanced dataset.

If you have an 80:20 distribution, we can get 80% accuracy with standard scoring methods by just writing a few lines of code to always return the 80% class.

Here is an accuracy method which allows you to get class specific accuracy, or a mean of each class accuracy:

Regarding training on an unbalanced dataset, loss functions allow you to pass in a weight(for binary classes) or pos_weight(for multi-class) argument. These will in turn increase the loss ratio for the minority classes and/or decrease the loss ratio for the majority classes.

1 Like