Combining time invariant information with sequence

Hello everyone,

thanks for talking the time to read this.

I’m currently building a LSTM-based sequence classifier and I’m happy with my progress so far (yes, this is my first real project with PyTorch and deep learning in general).
However, there’s some auxiliary information that belongs to each sequence, that I’d also like to take account during the classification process. I see four ways to do it and I was hoping whether you can help me decide which way to go:

  1. Prepend the actual sequence with a pseudo time step that contains the auxiliary information.
  2. Append a pseudo time step that contains the auxiliary information to the actual sequence.
  3. Enrich every step of the sequence with the auxiliary information
  4. Combine two models, i.e. use LSTM as is and build a small classifier (a few fully connected layers) for the aux infos. Finally torch.cat the results of both and use a FC layer for final output.

Thoughts so far:
regarding 1: From what I can tell, this is similar to the CNN → RNN architecture, I’ve been reading about - where the output of an CNN in fed into the RNN and the the text description follows. My sequences may be fairly long (~1000 time steps) and I’m somewhat concerned that some important information may get lost. I guess I’m wrongly concerned, but nevertheless it is what it is.
regarding 2: similar to 1, except less concerned about stuff getting lost
regarding 3: RNN wouldn’t have to remember anything, since info is given in every step. However, this seems somewhat against the nature of RNNs.
regarding 4: Major drawback: complicated structure with more variables and tunables. Also training may be harder. Perhaps may have to train piecewise, i.e. first LSTM on sequence, second little helper-classifier on aux input and finally freeze both and train the last connecting layer. I’d rather train all at once. But this variant doesn’t have the problem that important info may be overlooked by LSTM.

Where I’m at: Based on a toy problem, I got 1 and 4 to work. 2 and 3 I haven’t tried yet. However, I’ve no idea which one is better. Perhaps I’m overlooking something.

The more I’m thinking about it, the more I’m leaning towards version 1. Can anyone help me decide? Am I missing something? Is there anything I could look for to help me decide?

Any help is appreciated! Thank you

Perhaps to be able to better understand the setting, consider a sequence that was recorded. The aux info would then be where on earth this happened, outside temperature, etc…

I cannot tell for sure which solution will get you the best result, but I would argue that (4) is the most intuitive way to do this. Your auxiliary data is is not part of the sequence in the sense that it’s not a time step with in the sequence.

Hence I would definitely (a) push the sequence through the LSTM and get the last hidden state, and (b) push the auxiliary information through some linear layers (or a CNN depending on the nature of your data) to got some output of some linear, and lastly (c) concatenate the hidden state from the LSTM and from the “auxiliary” linear layer. And pushing this concatenated tensor through some more linear layers.

Thanks for your answer.

Yes, my current approach is exactly implemented the way you describe. It’s always very reassuring to see that it makes sense to someone else but me. :slight_smile:

Something that I did notice is that the “final” linear layers should not contain any activation function. Based on some toy data I figured that it works much better without.

Anyhow, thanks again for sharing your thoughts.