Error in the input dimension

I’m new to PyTorch and I’m trying to build a CNN which is structured as follows:

Net(
(conv1): Conv2d(1, 32, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(32, 64, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=576, out_features=256, bias=True)
(fc2): Linear(in_features=256, out_features=128, bias=True)
(out): Linear(in_features=128, out_features=10, bias=True)
(act): ReLU()
(mp): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

My input are batches of 32 images from the MNIST digits dataset [32, 1, 28, 28]. The error is the following:

Given groups=1, weight of size 64 32 5 5, expected input[32, 1, 28, 28] to have 32 channels, but got 1 channels instead

I read a lot of topics about the same error. My input is formatted in 32 batches, 1 channel and 28x28 image. I need to pass the same shape to the network. But I don’t properly understand how to do that. Here I’m sharing the code of the network:

class Net(nn.Module):
    
    # Constructor of the class, when an object is created it has (Ni, Nh1, Nh2, No) as parameters 
    # self is variable represents the instance of the object itself
    def __init__(self, Ni, Nh1, Nh2, No):
        # super() lets you avoid referring to the base class explicitly
        super(Net, self).__init__()
        # Defining the blocks
        '''
        Parameters:
            in_channels, out_channels, kernel_size, stride=1, padding=0, 
            dilation=1, groups=1, bias=True, padding_mode='zeros'
        '''
        self.conv1 = nn.Conv2d(in_channels=Ni, out_channels=Nh1, kernel_size=5, stride=1)
              
        self.conv2 = nn.Conv2d(in_channels=Nh1, out_channels=Nh2, kernel_size=5, stride=1)
        
        self.fc1 = nn.Linear(in_features=Nh2*3*3,out_features=256)
        self.fc2 = nn.Linear(in_features=256,out_features=128)
        self.out = nn.Linear(in_features=128,out_features=No)
        
        self.act = nn.ReLU()
        
        self.mp = nn.MaxPool2d(kernel_size=2, stride=2)
        
    def forward(self, x):
        
        layer1_out = self.mp(self.act(self.conv1(x)))
        layer2_out = self.mp(self.act(self.conv2(x)))
        
        output_flatten = layer2_out.x.view(-1,12*4*4)
        
        output_fc1 = self.act(self.fc1(output_flatten))
        output_fc2 = self.act(self.fc2(output_fc1))
        output = self.out(output_fc2)
        
        return output

Ni = 1, Nh1 = batchSize, Nh2 = 2*batchSize, No = 10

Thank you in advance!

Probably I’m miss understanding the in_channels and out_channels values of conv2d.

Another info, I’m treating data in this way:

torch_X_train = torch.from_numpy(X_train).type(torch.LongTensor)
torch_y_train = torch.from_numpy(y_train).type(torch.LongTensor) # data type is long

create feature and targets tensor for test set.

torch_X_test = torch.from_numpy(X_test).type(torch.LongTensor)
torch_y_test = torch.from_numpy(y_test).type(torch.LongTensor) # data type is long

torch_X_train = torch_X_train.view(-1, 1,28,28).float()
torch_X_test = torch_X_test.view(-1,1,28,28).float()
print(torch_X_train.shape)
print(torch_X_test.shape)

Pytorch train and test sets

train = torch.utils.data.TensorDataset(torch_X_train,torch_y_train)
test = torch.utils.data.TensorDataset(torch_X_test,torch_y_test)

data loader

train_loader = torch.utils.data.DataLoader(train, batch_size = n_batch, shuffle = False)
test_loader = torch.utils.data.DataLoader(test, batch_size = n_batch, shuffle = False)

self.conv2(x) should get the output of self.conv1, which is layer1_out.

Also, layer2_out.x.view(-1,12*4*4) seems to be wrong (additional x before the view call).

Thank you. Actually I changed the whole architecture:

class Net(nn.Module):
    
    def __init__(self, Fm1, Fm2, Fm3, Ks, Ni, Nh1, No, act, drop_rate):
        # super() lets you avoid referring to the base class explicitly
        super().__init__()

        '''
          in_channels = 1 since image is grayscale 
          out_channels = Number of feature maps you want in output
          act = Activation function chosen
          Fm1 = Feature maps layer conv 1
          Fm2 = Feature maps layer conv 2
          Fm3 = Feature maps layer conv 3
          Ks = Kernel size  
          Ni = Input of Linear Layer = Fm3
          Nh1 = Number of neurons for the first hidden dense layer
          No = Output = 10
          act = Activation function
          Droprate = % of dropout 
        '''
        
        self.conv1 = nn.Sequential(
            
            nn.Conv2d(in_channels=1, out_channels=Fm1, kernel_size=Ks, stride=1),
            nn.ReLU(),
            nn.BatchNorm2d(Fm1),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(drop_rate)
        )

        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels=Fm1, out_channels=Fm2, kernel_size=Ks, stride=1),
            nn.ReLU(),
            nn.BatchNorm2d(Fm2),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(drop_rate)
        )

        self.conv3 = nn.Sequential(
            nn.Conv2d(in_channels=Fm2, out_channels=Fm3, kernel_size=Ks, stride=1),
            nn.ReLU(),
            nn.BatchNorm2d(Fm3),
            nn.MaxPool2d(kernel_size=2),
            nn.Dropout(drop_rate)
        )
        
        self.fc1 = nn.Sequential(
            nn.Linear(Fm3, Nh1)
        )
        
        self.fc2 = nn.Sequential(
           nn.Linear(Nh1,No) 
        )
              
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        x = self.conv3(x)
        
        #print("X prev: ", x.shape)
        x = x.view(x.size(0),-1)
        #print("X after: ", x.shape)
        x = self.fc1(x)
        x = self.fc2(x)
        x = F.log_softmax(x, dim=1)
        return x

All parameters are explained. Is there something that I’m doing wrong? Ni should be always equal to the number of feature maps of the last conv layer? Do you have suggestions for improving it?

I would suggest to add an activation function after self.fc1, since otherwise you are basically just applying a single linear layer.
Besides that your architecture looks alright. Do you see any other issues using it?

Should Ni be always equal to the number of feature maps of the last conv layer?

Actually I would like to do cross validation for finding the best configuration of parameters (Feature maps, learning rate, dropout rate etc). In fact i build the network in order to pass all these parameters manually. Is there something like GridSearch? Are there simple example for model selection?

It seems you are not using Ni in your model architecture, but Fm3 instead.
The number of input features to the first linear layer should match the (flattened) output shape of the preceding layer.
If your conv layer outputs an activation of [batch_size, out_channels, 1, 1], in_features of the linear layer would be equal to out_channels.
However, if the spatial size is larger than 1x1, you would usually flatten the conv output and set in_features=out_channels*h*w.

I haven’t used it personally, but skorch might give you a good interface to sklearn’s grid search etc.

1 Like