Linear Regression with CNN using Pytorch: input and target shapes do not match: input [400 x 1], target [200 x 1]

(Aditya Raj) #1

Let me explain the objective first. Let’s say I have 1000 images each with an associated quality score [in range of 0-10]. Now, I am trying to perform the image quality assessment using CNN with regression(in pytorch). I have divided the images into equal size patches. Now, I have created a CNN network in order to perform the linear regression.

Following is the code: class MultiLabelNN(nn.Module):

class MultiLabelNN(nn.Module):

def __init__(self):
    super(MultiLabelNN, self).__init__()
    self.conv1 = nn.Conv2d(1, 32, 5)
    self.pool = nn.MaxPool2d(2, 2)
    self.conv2 = nn.Conv2d(32, 64, 5)
    self.fc1 = nn.Linear(3200,1024)
    self.fc2 = nn.Linear(1024, 512)
    self.fc3 = nn.Linear(512, 1)
def forward(self, x):
    x = self.conv1(x)
    x = F.relu(x)
    x = self.pool(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = x.view(-1, 3200)
    x = self.fc1(x)
    x = F.relu(x)
    x = self.fc2(x)
    x = F.relu(x)
    x = self.fc3(x)
    return x    

While running this code of network I am getting following error

input and target shapes do not match: input [400 x 1], target [200 x 1]

the target shape is [200x1] is because I have taken the batch size of 200. I found the solution that if I change “self.fc1 = nn.Linear(3200,1024)” and “x = x.view(-1, 3200)” here from 3200 to 6400 my code runs without any error.

Similarly, It will throw an error input and target shapes do not match: input [100 x 1], target [200 x 1] if I put 12800 instead of 6400

Now my doubt is that I am not able to understand the reason behind this. If I am giving 200 images as input to my network then why the input shape is getting affected while changing the parameters when I move from convolutional layer to fully connected layer. I hope I have clearly mentioned my doubt. Even though If anybody has any doubt please ask me. It will be a great help. Thanks in advance.


Could you print the shape of x just before the .view call?
I think 3200 is the wrong number here and thus your batch size will increase.
You are pushing “everything left” into the batch dimension using view(-1, 3200). So if x has 6400 features, the batch dimension will be doubled.
You could use x = x.view(x.size(0), -1) instead, which will give you a size mismatch error in the next linear layer.