LSTM: Encoder/Decoder architecture for different input/output sequence length

So I have input data which consists of 9 variables with a sequence length of 92. I want to predict a sequence of 7 other variables, however, this one has a sequence length of 4. Input shapes into my model would be the following: input X: [batch size, 92, 9] and target Y: [batch size, 4, 7].

Further information is that both sequences (the X sequence, and the Y sequence) co-occur, for which I mean that the sequence of 4 is thus not an extension of the sequence of 92.

Initially I thought that this is no problem for an LSTM model, however, my initial model would predict a sequence of 92, not 4, simply because an input of 92 is inputted. So I came across that one could achieve a different output sequence length by making use of an encoder/decoder architecture, however I think I am currently only pretending it is:

class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
    def forward(self, x):
         # output hidden[0] has shape [1, batch size, hidden size]
        _, hidden = self.lstm(x)
        return hidden

class Decoder(nn.Module):
    def __init__(self, hidden_size, input_size, output_size):
        super(Decoder, self).__init__()
        self.hidden_size = hidden_size

        self.lstm = nn.LSTM(int(hidden_size/4), hidden_size, batch_first=True)
        self.linear = nn.Linear(hidden_size, output_size) 
    def forward(self, hidden):
        # output has shape [batch size, 4, output size]
        hidden = torch.flatten(hidden[0], end_dim=1)
        hidden = hidden.unsqueeze(1)
        hidden = hidden.reshape(-1, 4, int(self.hidden_size/4))
        output, _ = self.lstm(hidden)
        output = self.linear(output)
        return output

class seq2seq(nn.Module):
    def __init__(self, input_size, output_size, hidden_size):

        super(seq2seq, self).__init__()

        self.input_size = input_size
        self.hidden_size = hidden_size

        self.encoder = Encoder(input_size, hidden_size)
        self.decoder = Decoder(hidden_size, input_size, output_size)

My understanding is that the encoder wraps the input sequence into a code, and the decoder sort of unwraps it into the output sequence.

What I currently did, is using the encoder class to obtain the last hidden state, which I then want to unwrap in the decoder class. But the reason I use this architecture in the first place was to obtain an output sequence of 4, and I don’t think I have figured it out yet. What I currently do is practically manually shaping the ‘hidden’ tensor into a shape of [batch size, hidden size] and then adding a new dimension (unsqueeze), and then reshaping it into [batch size, 4, hidden size/4]. I tried something with ‘teacher forcing’ but I just failed to understand it and ran into the same problem eventually.

What am I missing in my architecture? Or is this ok?