How to make output of CNN to be input of RNN Layer?

Hi,
I am new to CNN, RNN and deep learning.
I am trying to make architecture that will combine CNN and RNN.
input image size = [20,3,48,48]
a CNN output size = [20,64,48,48]
and now i want cnn ouput to be RNN input
but as I know the input of RNN must be 3-dimension only which is [seq_len, batch, input_size]
How can I make 4-dimensional [20,64,48,48] tensor into 3-dimensional for RNN input?

You would have to decide which dimension(s) should be the temporal dimension (seq_len) and which the features (input_size).
E.g. you could treat the output channels as the features and the spatial dimensions (height and width) as the temporal dimension.
To do so, you could first flatten the spatial dimensions via:

output = output.view(output.size(0), output.size(1), -1)

and then permute the dimensions via:

output = output.permute(2, 0, 1)
1 Like

But what if we use batch_first=True in the RNN layer? @ptrblck

In that case, the permutation should be:

output = output.permute(0, 2, 1)

since the RNN expects the input as [batch_size, seq_len, features].

2 Likes

Many thank @ptrblck
After your solution my Input size change to[2304, 20, 64] after though RNN layer my output is the same size. So I try to reshape back to [20, 64, 48, 48] because my Network is CNN > RNN > CNN.
But somehow after thought RNN layer. some data in tensor have changed.
While Iā€™m trying take the output back in CNN I got this error

TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple

Do you have any idea about this problem?

nn.RNN outputs a tuple as output, hidden_state as described in the docs, so you might want to use only one of these outputs for further processing.

1 Like