So I have input data which consists of 9 variables with a sequence length of 92. I want to predict a sequence of 7 other variables, however, this one has a sequence length of 4. Input shapes into my model would be the following: input X: [batch size, 92, 9] and target Y: [batch size, 4, 7].
Further information is that both sequences (the X sequence, and the Y sequence) co-occur, for which I mean that the sequence of 4 is thus not an extension of the sequence of 92.
Initially I thought that this is no problem for an LSTM model, however, my initial model would predict a sequence of 92, not 4, simply because an input of 92 is inputted. So I came across that one could achieve a different output sequence length by making use of an encoder/decoder architecture, however I think I am currently only pretending it is:
class Encoder(nn.Module):
def __init__(self, input_size, hidden_size):
super(Encoder, self).__init__()
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
def forward(self, x):
# output hidden[0] has shape [1, batch size, hidden size]
_, hidden = self.lstm(x)
return hidden
class Decoder(nn.Module):
def __init__(self, hidden_size, input_size, output_size):
super(Decoder, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(int(hidden_size/4), hidden_size, batch_first=True)
self.linear = nn.Linear(hidden_size, output_size)
def forward(self, hidden):
# output has shape [batch size, 4, output size]
hidden = torch.flatten(hidden[0], end_dim=1)
hidden = hidden.unsqueeze(1)
hidden = hidden.reshape(-1, 4, int(self.hidden_size/4))
output, _ = self.lstm(hidden)
output = self.linear(output)
return output
class seq2seq(nn.Module):
def __init__(self, input_size, output_size, hidden_size):
super(seq2seq, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.encoder = Encoder(input_size, hidden_size)
self.decoder = Decoder(hidden_size, input_size, output_size)
My understanding is that the encoder wraps the input sequence into a code, and the decoder sort of unwraps it into the output sequence.
What I currently did, is using the encoder class to obtain the last hidden state, which I then want to unwrap in the decoder class. But the reason I use this architecture in the first place was to obtain an output sequence of 4, and I don’t think I have figured it out yet. What I currently do is practically manually shaping the ‘hidden’ tensor into a shape of [batch size, hidden size] and then adding a new dimension (unsqueeze), and then reshaping it into [batch size, 4, hidden size/4]. I tried something with ‘teacher forcing’ but I just failed to understand it and ran into the same problem eventually.
What am I missing in my architecture? Or is this ok?