Combine LSTM and CNN in the final layer to get the final result

Is it possible to build such a model structure to get the final result:

Notation: N is batch_size, h is lstm hidden_size

The output of CNN would be of shape [N, C, W, H].
Then reshape it to [N, W * H, C].

The output of the LSTM at the last time step would be [N, 1, h].
Now, on this output of LSTM, you can do one thing is do .repeat(1, W * H, 1),
so that the output would now be of shape [N, W * H, h]

Now, you can concatenate the output of CNN and LSTM at dim=2 so that the overall output shape would be: [N, W * H, C + h].

Now, you can pass it to the linear layer and do whatever you want to do.

1 Like

Thank you very much. This is the first time I am going to combine these two models, thank you for your suggestions.