Background
In binary classification problem of dataset consisting of sequences of events, performance varies across types of models:
- for sequence of events, LSTM gives an average precision score of
APS = 0.60
- for features engineered with summary stats (first-last occurrence, overall frequency etc), LightGBM gives an
APS = 0.80
Dataset specs
- The dataset is highly imbalance (
ratio=0.1
). - Number of unique tokens is
<100
Questions
- Would there be any obvious reason for the above that I am missing?
- Importantly, what would be a way to combine the LSTM and lightGBM ports?