Hi,

I am new to CNN, RNN and deep learning.

I am trying to make architecture that will combine CNN and RNN.

input image size = [20,3,48,48]

a CNN output size = [20,64,48,48]

and now i want cnn ouput to be RNN input

but as I know the input of RNN must be 3-dimension only which is [seq_len, batch, input_size]

How can I make 4-dimensional [20,64,48,48] tensor into 3-dimensional for RNN input?

You would have to decide which dimension(s) should be the temporal dimension (`seq_len`

) and which the features (`input_size`

).

E.g. you could treat the output channels as the features and the spatial dimensions (height and width) as the temporal dimension.

To do so, you could first flatten the spatial dimensions via:

```
output = output.view(output.size(0), output.size(1), -1)
```

and then permute the dimensions via:

```
output = output.permute(2, 0, 1)
```

In that case, the permutation should be:

```
output = output.permute(0, 2, 1)
```

since the RNN expects the input as `[batch_size, seq_len, features]`

.

Many thank @ptrblck

After your solution my Input size change to`[2304, 20, 64]`

after though RNN layer my output is the same size. So I try to reshape back to `[20, 64, 48, 48]`

because my Network is CNN > RNN > CNN.

But somehow after thought RNN layer. some data in tensor have changed.

While Iām trying take the output back in CNN I got this error

```
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple
```

Do you have any idea about this problem?

`nn.RNN`

outputs a tuple as `output, hidden_state`

as described in the docs, so you might want to use only one of these outputs for further processing.

hi

i am trying to feed output features of my 3DCNN to gru

inputs to my 3DCNN are videos processed and stored in numpy arrays of the shape

[70, 1, 29, 88, 88]where each dimension corresponds to [batch size , num of channels , number of frames, Hight , width]

here is my 3DCNN

class CNN3D(nn.Module): #classes_num

def **init**(self, t_dim=29, img_x=88, img_y=88, drop_p=0.2, fc_hidden1=256, fc_hidden2=128, num_classes=2):

super(CNN3D, self).**init**()

```
# set video dimension
self.t_dim = t_dim
self.img_x = img_x
self.img_y = img_y
# fully connected layer hidden nodes
self.fc_hidden1, self.fc_hidden2 = fc_hidden1, fc_hidden2
self.drop_p = drop_p
self.num_classes = num_classes
self.ch1, self.ch2 = 32, 48
self.k1, self.k2 = (5, 5, 5), (3, 3, 3) # 3d kernel size
self.s1, self.s2 = (2, 2, 2), (2, 2, 2) # 3d strides
self.pd1, self.pd2 = (0, 0, 0), (0, 0, 0) # 3d padding
# compute conv1 & conv2 output shape
self.conv1_outshape = conv3D_output_size((self.t_dim, self.img_x, self.img_y), self.pd1, self.k1, self.s1)
self.conv2_outshape = conv3D_output_size(self.conv1_outshape, self.pd2, self.k2, self.s2)
self.conv1 = nn.Conv3d(in_channels=1, out_channels=self.ch1, kernel_size=self.k1, stride=self.s1,
padding=self.pd1)
self.bn1 = nn.BatchNorm3d(self.ch1)
self.conv2 = nn.Conv3d(in_channels=self.ch1, out_channels=self.ch2, kernel_size=self.k2, stride=self.s2,
padding=self.pd2)
self.bn2 = nn.BatchNorm3d(self.ch2)
self.relu = nn.ReLU(inplace=True)
self.drop = nn.Dropout3d(self.drop_p)
self.pool = nn.MaxPool3d(2)
self.fc1 = nn.Linear(self.ch2 * self.conv2_outshape[0] * self.conv2_outshape[1] * self.conv2_outshape[2],#self.ch2 * self.conv2_outshape[0] * self.conv2_outshape[1] * self.conv2_outshape[2]
self.fc_hidden1) # fully connected hidden layer
self.fc2 = nn.Linear(self.fc_hidden1, self.fc_hidden2)
self.fc3 = nn.Linear(self.fc_hidden2, self.num_classes) # fully connected layer, output = multi-classes
def forward(self, x_3d):
#print(x_3d.shape) #[70, 1, 29, 88, 88]
# Conv 1
x = self.conv1(x_3d)
x = self.bn1(x)
x = self.relu(x)
x = self.drop(x)
print(x.shape) #[70, 32, 13, 42, 42]
# Conv 2
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.drop(x)
#print(x.shape) #[70, 48, 6, 20, 20]
# FC 1 and 2
x = x.view(x.size(0), -1)
#print(x.shape) #[70, 115200]
x = F.relu(self.fc1(x))
#print(x.shape) #[70, 256]
x = F.relu(self.fc2(x))
#print(x.shape) #[70, 128]
x = F.dropout(x, p=self.drop_p, training=self.training)
x = self.fc3(x)
#print(x.shape) #[70, 2]
return x
```

i tried to add gru layer before linear layers but i could not as gru expects input of shape (batch size , sequence length , input size) and my 3DCNN is reducing the number of frames 29

how to feed 3DCNN extracted features to gru ?