How to make output dimensions match input dimensions in CNN?

in2dl · November 27, 2017, 12:20pm

I have training samples of the following shape: (1000,2). These are numeric sequences, each of length = 1000 and dimension = 2. I need to build a convolutional neural network to output predictions/sequences of the same shape (1000, 2). Since, after applying convolution and pooling, the height and width of the input is reduced. How should I then set up the fully connected layer(s) and an output layer in my CNN, so that the output dimensions match the input dimensions?

i.e. for each input sample of shape (1000, 2) how can I produce an output of same shape (1000, 2) and how can I set up the last fully connected / output layers to achieve this in PyTorch?

mortezamg63 · November 27, 2017, 12:58pm

As I know using pooling layer reduces the dimension. Also, the convolution can reduce the size based on choosing stride, kernel size and padding. Therefore, if you want to keep the size without change, you must choose stride, kernel size and padding in a way that prevent dimension reduction.
Moreover, pooling reduces the dimension of the input to the layer. Hence, in case you want to keep the dimension the same as input, remove pooling layers and change kernel size, stride and padding in convolution layers.

Also, when you want to feed the output of the last convolution layer to fc layer, you must pay attention to changing the shape of last layer’s output because the output is bs x c x w x h ( batch_size, channel, width, height), but the input to fc layer must be 1x n. So, you must change the dimension to a vector through this command:
x = x.view(-1, c x w x h)
this command says to change the dimension to a vector whose shape is (1, c x w x h). a sample code on MNIST dataset in forward function of defined model is as follow. Please note that in this sample code it is not considered to keep the size of input image without change.

class ConvNet(nn.Module):
def __init__(self):
    super(ConvNet, self).__init__()
    self.conv1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=(5,5),  stride=(1,1))
    self.conv2 = nn.Conv2d(in_channels=16, out_channels=64,  kernel_size=(5,5), stride=(1,1))
    self.fc1 = nn.Linear(in_features=4*4*64,  out_features=256)
    self.fc2 = nn.Linear(in_features=256, out_features=10)

def forward(self,x):
    x = self.conv1(x)
    x = F.relu(x)
    x = F.max_pool2d(x, 2, 2)
    x = self.conv2(x)
    x = F.relu(x)
    x = F.max_pool2d(x, 2, 2)
    ########################## feed output to fully connected layer #############
   ## vectorizing the input matrix
    x = x.view(-1, 4*4*64)
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
    return x

I hope this guide can help you

in2dl · November 28, 2017, 2:58am

Thank you very much for your answer. I actually have a more general question, I was wondering if you have any suggestions. I have samples of sequences which I mentioned in my question, I need to build a CNN (not RNN) for sequence prediction. I’m not given any target sequences. So, in this case, if I were to use RNN, I would use all but the last time step in each sequence as a training sequence, and all but the first time step as a target sequence. But in case of CNN, should the training and target sequences be the same, is this correct?

Also, is there any CNN architecture that is appropriate for n-dimensional sequence prediction? In my case, I’d need the predicted sequence to be of the same dimensions as input sequences, since I’m using MSE loss for evaluation.

Sai_Santosh_Nooney · January 17, 2018, 3:27am

Check out fully convolutions nn using semantic segmentation which uses skip architecture to maintain same size