One-to-many LSTM with variable length sequences

I want to build an LSTM model which takes a state S0 as the input and then the output is a sequence of S1, S2, … Sn. The length of the output sequence is variable. Three more points:

1- Each state in the sequence depends on the previous one. So, let’s say S1 leads to S2, then S2 leads to S3 and at some point, it should be a possibility to make a decision to stop for example at the Sn state.
Or alternatively, the output sequence can be set to a fixed value and after some point, the state should not change (like Sn leads to Sn again). Which one is easier to implement and how?

2- Ideally, we should be able to start from S2 and get S3 and so on. I guess this behavior is similar to return_sequences =True flag in Keras. Should I train the network on all possible subsequences? or is there a way to learns this only from the main sequence?

3- Each state has a vector with a dimension of 100. The first 20 dimensions (let’s call it ID) are fixed through a sequence (IDs are different from each other however it should stay unchanged during the sequences). How is possible to keep this fixed within LSTM?

  1. PyTorch should make it really easy for you to implement either alternative, but the first technique seems easier. You can simply have a termination condition, and keep generating newer outputs in a sequence until that termination condition is met. You could then compute the loss and backprop.

  2. This really depends on the kind of data you’re using. To me it seems like you could sample subsequences of varying lengths and train on those. (This isn’t really a PyTorch question.)

  3. LSTM hidden states change due to updates from whatever gradient-based learning scheme you end up using. I see no reason why these first 20 dimensions should be part of a hidden state if these are not learnable. It’s not possible to keep these fixed in standard RNN/LSTM implementations.

@krishnamurthy, thanks for the reply.

1- Do you know any sample code that showed how we can incorporate (stop) condition inside the LSTM?
2- Training subsequent is possible. However, I wanted to automatically use the previous output as the input of the next hidden unit.
3- Well, the first 20 dimensions are learnable. In the sequence S = [S0, S1, … , Sn] the first 20 dimensions are fixed, but they are different from the one of sequence S’ = [S’0, S’1, …, S’m].