FFN versus RNN models

I am building models for chronic disease prediction (binary classification) using a sequence of diagnoses, procedure and medication codes collected over a two-year period, 8 years before the confirmation of the disease.

Ex: training: 50k observations, 1800 codes/features
sequence length ptiles 25th = 12 codes, 50th = 30 codes, 75th = 71 codes

Model 1: FFN

  • computed matrix where columns are features and values are number of occurrence of a code. Built vanilla FFN.
  • ran extensive grid search. best arch has 4 hidden layers with 127 neurons in each, ReLu act
    and dropout of .6 between all of them. #param: 279,785, f1: 27.6% train time/epoch 2.67s on V100/32GB

Model 2: RNN

  • sorted obs (and labels) based on sequence lengths from smallest to largest
  • (pack_sequence) => embed => pack_padded_sequence (bat_f=T) => LSTM => FFN
  • grid search. best arch: embed dim: 50, lstm_hdim: 255, fc layer widths: 63, 31, dropout .6, #param: 422,187, f1: 26.2%, train time/epoch 55s on same GPU.

Questions

  • the FFN has ~half the number of variables to be trained. Yet it is >20x faster. I realize that may mean nothing in light of algorithmic complexities. But, does that make sense? In both cases the entire instance is on the GPU (CPU is not used for anything). For the FFN I am using the TensorDataset class. For LSTM, it is DataSet with collate_fn for DataLoader suitably modified. Any ideas for speeding up the training?

  • there wasn’t an expectation that the sequence of codes 8 years out would be important as much as the code itself. But I did think the results would be closer. I tried with 2 LSTMs, bidirectional-LSTM, etc. etc. Is there any other arch change worth looking at? Attention?