Hello everyone,
thanks for talking the time to read this.
I’m currently building a LSTM-based sequence classifier and I’m happy with my progress so far (yes, this is my first real project with PyTorch and deep learning in general).
However, there’s some auxiliary information that belongs to each sequence, that I’d also like to take account during the classification process. I see four ways to do it and I was hoping whether you can help me decide which way to go:
- Prepend the actual sequence with a pseudo time step that contains the auxiliary information.
- Append a pseudo time step that contains the auxiliary information to the actual sequence.
- Enrich every step of the sequence with the auxiliary information
- Combine two models, i.e. use LSTM as is and build a small classifier (a few fully connected layers) for the aux infos. Finally torch.cat the results of both and use a FC layer for final output.
Thoughts so far:
regarding 1: From what I can tell, this is similar to the CNN → RNN architecture, I’ve been reading about - where the output of an CNN in fed into the RNN and the the text description follows. My sequences may be fairly long (~1000 time steps) and I’m somewhat concerned that some important information may get lost. I guess I’m wrongly concerned, but nevertheless it is what it is.
regarding 2: similar to 1, except less concerned about stuff getting lost
regarding 3: RNN wouldn’t have to remember anything, since info is given in every step. However, this seems somewhat against the nature of RNNs.
regarding 4: Major drawback: complicated structure with more variables and tunables. Also training may be harder. Perhaps may have to train piecewise, i.e. first LSTM on sequence, second little helper-classifier on aux input and finally freeze both and train the last connecting layer. I’d rather train all at once. But this variant doesn’t have the problem that important info may be overlooked by LSTM.
Where I’m at: Based on a toy problem, I got 1 and 4 to work. 2 and 3 I haven’t tried yet. However, I’ve no idea which one is better. Perhaps I’m overlooking something.
The more I’m thinking about it, the more I’m leaning towards version 1. Can anyone help me decide? Am I missing something? Is there anything I could look for to help me decide?
Any help is appreciated! Thank you