So I have input data which consists of 9 variables with a sequence length of 92. I want to predict a sequence of 7 other variables, however, this one has a sequence length of 4. Input shapes into my model would be the following: input X: [batch size, 92, 9] and target Y: [batch size, 4, 7].
Further information is that both sequences (the X sequence, and the Y sequence) co-occur, for which I mean that the sequence of 4 is thus not an extension of the sequence of 92.
Initially I thought that this is no problem for an LSTM model, however, my initial model would predict a sequence of 92, not 4, simply because an input of 92 is inputted. So I came across that one could achieve a different output sequence length by making use of an encoder/decoder architecture, however I think I am currently only pretending it is:
class Encoder(nn.Module): def __init__(self, input_size, hidden_size): super(Encoder, self).__init__() self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True) def forward(self, x): # output hidden has shape [1, batch size, hidden size] _, hidden = self.lstm(x) return hidden class Decoder(nn.Module): def __init__(self, hidden_size, input_size, output_size): super(Decoder, self).__init__() self.hidden_size = hidden_size self.lstm = nn.LSTM(int(hidden_size/4), hidden_size, batch_first=True) self.linear = nn.Linear(hidden_size, output_size) def forward(self, hidden): # output has shape [batch size, 4, output size] hidden = torch.flatten(hidden, end_dim=1) hidden = hidden.unsqueeze(1) hidden = hidden.reshape(-1, 4, int(self.hidden_size/4)) output, _ = self.lstm(hidden) output = self.linear(output) return output class seq2seq(nn.Module): def __init__(self, input_size, output_size, hidden_size): super(seq2seq, self).__init__() self.input_size = input_size self.hidden_size = hidden_size self.encoder = Encoder(input_size, hidden_size) self.decoder = Decoder(hidden_size, input_size, output_size)
My understanding is that the encoder wraps the input sequence into a code, and the decoder sort of unwraps it into the output sequence.
What I currently did, is using the encoder class to obtain the last hidden state, which I then want to unwrap in the decoder class. But the reason I use this architecture in the first place was to obtain an output sequence of 4, and I don’t think I have figured it out yet. What I currently do is practically manually shaping the ‘hidden’ tensor into a shape of [batch size, hidden size] and then adding a new dimension (unsqueeze), and then reshaping it into [batch size, 4, hidden size/4]. I tried something with ‘teacher forcing’ but I just failed to understand it and ran into the same problem eventually.
What am I missing in my architecture? Or is this ok?