Sequential and static features combined in LSTM architecture

Rationale
I am doing sentiment classification (reviews classification) using an LSTM model and I want to enhance performance by allowing the model to consume static features, such as age and gender.

Aim
My aim is to introduce this static auxiliary features outside of the LSTM by means of additional fully connected layers. You might have a data flow like this:

SEQUENCE_INPUTS ------> LSTM -------->
                                      |---> MERGE ---> SIGMOID
STATIC_INPUTS -----> Preprocessing -->

Question
I am struggling with how to efficiently batch the data for this task e.g. should batching take place separately for sequential and static features or package all feature types together? It would be very helpful to see similar implementations from the community.

Yes, I would simple split the batch, one for the sequences and one for the static features. You only need to ensure that the batch sizes are the same…obviously :).

Say you have a batch of sequences seq_batch with a shape (batch_size, seq_len) and a batch of static features (batch_size, feature_dim), your forward() method might look like this:

def forward(seq_batch, static_batch):
    # Handle sequences
    X1 = self.embedding(seq_batch) # I assume you deal with text
    # X1.shape = (batch_size, seq_len, embed_dim)
    output, (h, c) = self.lstm(X1, self.hidden)
    X1 = h[-1] # I assume unidirectional LSTM to keep example simple
    # X1. shape = (batch_size, embed_dim)
    # Handle static features
    X2 = self.fc1(static batch) # or more linear layers
    # X2.shape = (batch_size, output_dim_fc1)
    X = torch.concat([X1, X2], dim=1)
    # X.shape = (batch_size, (embed_dim+output_dim_fc1) )
    X = self.fc2(X) # or more linear layers
    # X.shape = (batch_size, output_dim_fc2)
    # Last layers to get the right output for your loss
    ...

If you use multiple linear layers in sequence, you need of course some activation functions. I also omitted any Dropout or BatchNorm layers, etc. to keep it simple.

1 Like

Thanks, this worked.

For info, as far as split batching goes I utilised sklearn.model_selection.train_test_split(*arrays, **options) and torch.utils.data option to accept a variable number of arguments, and implementation looks like

# train/test data split 
seq_train, seq_test, static_train, static_test, train_y, test_y =  train_test_split(
    sequences, static_features, y, train_size=0.666, random_state=666)

# create tensor dataset
train_data = TensorDataset(
           torch.from_numpy(seq_train), torch.from_numpy(static_train), 
           torch.from_numpy(np.array(train_y))
)

# create dataloaders
train_loader = DataLoader(
             train_data, shuffle=True, 
             batch_size=666, drop_last=True)