Hello.
I am trying to use CNN-LSTM for the audio data. (e.g. spectrograms image)
The output shape of my CNN model is [batch_size, features] and LSTM expect an input of [batch_size, seq_len, features].
My question is how to make seq_len from CNN’s output.
Also, I want to study CNN-LSTM, so if you have any good materials, please share them
I assume the input for your CNN – i.e., a current batch – contains all the spectrogram images to form the time series you want to give the LSTM as input. Otherwise I wouldn’t know where your sequence comes from.
In this case the batch_size of the CNN becomes the seq_len for the LSTM, and for the latter, the batch_size is just 1 (as you don’t have batches with multiple sequences).