Match CNN output with LSTM input dimensions

natalie-sch · May 5, 2022, 1:29pm

I would like to build a hybrid CNN-LSTM model and have training samples of the following shape: (21,6000)

I built the CNN with the following layers

from torchsummary import summary
summary(model, (21, 6000))

returns

I would like to use the output of the CNN model (the output from the layer before softmax, for example) and use it as an input for the LSTM. To align the CNN output with the LSTM input, I looked into examples using padding=same and permute.
Could you give me a hint on what input shape the LSTM would expect here and how I could transform the output shape of the CNN to match that input shape?

As mentioned here CNN with LSTM input shapes - #9 by ananda2020, the following gets the required input shape of the LSTM

x = torch.randn(1, 2, 394, 1)
x = x.view(x.size(0), -1, x.size(3)) # [batch_size, features=channels*height, seq_len=width]
x = x.permute(2, 0, 1) # [seq_len, batch_size, features]
x.size()

returns

torch.Size([1, 1, 788])