Sorry, I am not sure if I follow the question clearly. If the order of layers in 1 --> 2 --> 3 which means RNN comes after CNN and then of course CNN won’t need anything from RNN. Instead, if the order is 3 --> 2 --> 1 then you can just choose not to use the hidden outputs from RNN when going into CNN.
If the h has no role to play in the fully connected linear layer output, _ = rnn(x) should suffice unless you want to initialize h yourself at which point output, _ = rnn(x, h) will do.
If you want to use the hidden outputs outside your forward function (for example, if you want to do seq2seq where the hidden outputs from the encoder are used in a decoder) then yes your formulation is fine (I am assuming the hidden passed into the forward function is passed on to the rnn as h).
Hi, I am having the same issue. I would like to do a CNN-GRU speaker identification task on preprocessed spectograms. How could you connect the two different NN-s?