How to create a LSTM with 'one to many'

I create a ‘many to one model’ with LSTM, and I want to transform it into a ‘one to many model’. But I am not sure how to edit the codes.

Below codes are the current ‘many to one model’ with LSTM.

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Set initial hidden and cell states 
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device) 
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        # Forward propagate LSTM
        out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size)
        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

# One to Many
y(t-3)     y(t-2)    y(t-1)    y(t)
|          |         |         | 
cell ---> cell ---> cell ---> cell

# Many to One
cell ---> cell ---> cell ---> cell
|          |         |         |
x(t-3)     x(t-2)    x(t-1)    x(t)

I assume that with ‘one-to-many’, you mean to have one single input that is mapped to outputs at multiple time steps. The best approach would probably depend on your actual problem, but one way would be to have an initial trainable input vector that is simply fed as input for every single time step.

Before I get into too many details here, I would actually recommend this well written SO answer that explains the implementation details very well, for Keras though

1 Like

Thank you!
I didn’t fully understand your answer, but thanks for giving me a relevant site address. I will refer to it.

What if doing this like below?
Excluding the input of a specific time step, all the input of the remaining time step is set to 0.

Excluding the input of a specific time step, all the input of the remaining time step is set to 0.

That would be for sure one possible approach, but keep in mind, that information in an LSTM, represented by the state vectors, “vanishes” across the time dimension. If you feed the input features only once, say at the first time step, it’s likely, that they won’t be fully propagated to the later time steps and hence the model would have a bias across the time dimension. Hence, you might want to use the same feature vector as input for every time step.

1 Like

Thanks! I learned a lot from you :slight_smile: