Hi,
I am new to CNN, RNN and deep learning.
I am trying to make architecture that will combine CNN and RNN.
input image size = [20,3,48,48]
a CNN output size = [20,64,48,48]
and now i want cnn ouput to be RNN input
but as I know the input of RNN must be 3-dimension only which is [seq_len, batch, input_size]
How can I make 4-dimensional [20,64,48,48] tensor into 3-dimensional for RNN input?
You would have to decide which dimension(s) should be the temporal dimension (seq_len
) and which the features (input_size
).
E.g. you could treat the output channels as the features and the spatial dimensions (height and width) as the temporal dimension.
To do so, you could first flatten the spatial dimensions via:
output = output.view(output.size(0), output.size(1), -1)
and then permute the dimensions via:
output = output.permute(2, 0, 1)
In that case, the permutation should be:
output = output.permute(0, 2, 1)
since the RNN expects the input as [batch_size, seq_len, features]
.
Many thank @ptrblck
After your solution my Input size change to[2304, 20, 64]
after though RNN layer my output is the same size. So I try to reshape back to [20, 64, 48, 48]
because my Network is CNN > RNN > CNN.
But somehow after thought RNN layer. some data in tensor have changed.
While Iām trying take the output back in CNN I got this error
TypeError: conv2d(): argument 'input' (position 1) must be Tensor, not tuple
Do you have any idea about this problem?
nn.RNN
outputs a tuple as output, hidden_state
as described in the docs, so you might want to use only one of these outputs for further processing.
hi
i am trying to feed output features of my 3DCNN to gru
inputs to my 3DCNN are videos processed and stored in numpy arrays of the shape
[70, 1, 29, 88, 88]where each dimension corresponds to [batch size , num of channels , number of frames, Hight , width]
here is my 3DCNN
class CNN3D(nn.Module): #classes_num
def init(self, t_dim=29, img_x=88, img_y=88, drop_p=0.2, fc_hidden1=256, fc_hidden2=128, num_classes=2):
super(CNN3D, self).init()
# set video dimension
self.t_dim = t_dim
self.img_x = img_x
self.img_y = img_y
# fully connected layer hidden nodes
self.fc_hidden1, self.fc_hidden2 = fc_hidden1, fc_hidden2
self.drop_p = drop_p
self.num_classes = num_classes
self.ch1, self.ch2 = 32, 48
self.k1, self.k2 = (5, 5, 5), (3, 3, 3) # 3d kernel size
self.s1, self.s2 = (2, 2, 2), (2, 2, 2) # 3d strides
self.pd1, self.pd2 = (0, 0, 0), (0, 0, 0) # 3d padding
# compute conv1 & conv2 output shape
self.conv1_outshape = conv3D_output_size((self.t_dim, self.img_x, self.img_y), self.pd1, self.k1, self.s1)
self.conv2_outshape = conv3D_output_size(self.conv1_outshape, self.pd2, self.k2, self.s2)
self.conv1 = nn.Conv3d(in_channels=1, out_channels=self.ch1, kernel_size=self.k1, stride=self.s1,
padding=self.pd1)
self.bn1 = nn.BatchNorm3d(self.ch1)
self.conv2 = nn.Conv3d(in_channels=self.ch1, out_channels=self.ch2, kernel_size=self.k2, stride=self.s2,
padding=self.pd2)
self.bn2 = nn.BatchNorm3d(self.ch2)
self.relu = nn.ReLU(inplace=True)
self.drop = nn.Dropout3d(self.drop_p)
self.pool = nn.MaxPool3d(2)
self.fc1 = nn.Linear(self.ch2 * self.conv2_outshape[0] * self.conv2_outshape[1] * self.conv2_outshape[2],#self.ch2 * self.conv2_outshape[0] * self.conv2_outshape[1] * self.conv2_outshape[2]
self.fc_hidden1) # fully connected hidden layer
self.fc2 = nn.Linear(self.fc_hidden1, self.fc_hidden2)
self.fc3 = nn.Linear(self.fc_hidden2, self.num_classes) # fully connected layer, output = multi-classes
def forward(self, x_3d):
#print(x_3d.shape) #[70, 1, 29, 88, 88]
# Conv 1
x = self.conv1(x_3d)
x = self.bn1(x)
x = self.relu(x)
x = self.drop(x)
print(x.shape) #[70, 32, 13, 42, 42]
# Conv 2
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.drop(x)
#print(x.shape) #[70, 48, 6, 20, 20]
# FC 1 and 2
x = x.view(x.size(0), -1)
#print(x.shape) #[70, 115200]
x = F.relu(self.fc1(x))
#print(x.shape) #[70, 256]
x = F.relu(self.fc2(x))
#print(x.shape) #[70, 128]
x = F.dropout(x, p=self.drop_p, training=self.training)
x = self.fc3(x)
#print(x.shape) #[70, 2]
return x
i tried to add gru layer before linear layers but i could not as gru expects input of shape (batch size , sequence length , input size) and my 3DCNN is reducing the number of frames 29
how to feed 3DCNN extracted features to gru ?