RuntimeError: Given groups=1, weight of size [16, 3, 5, 5], expected input[16, 84, 84, 3] to have 3 channels, but got 84 channels instead

I am getting the following error for the Conv2d in pytorch. Can anyone tell me how to resolve this issue?

Here is the details of CNN architecture:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()        
        self.conv1 = nn.Sequential(         
            nn.Conv2d(
                in_channels=3,              
                out_channels=16,            
                kernel_size=5,              
                stride=1,                   
                padding=2,                  
            ),                              
            nn.ReLU(),                      
            nn.MaxPool2d(kernel_size=2),    
        )
        self.conv2 = nn.Sequential(         
            nn.Conv2d(16, 32, 5, 1, 2),     
            nn.ReLU(),                      
            nn.MaxPool2d(2),                
        )        
        self.out = nn.Linear(32 * 7 * 7, 2)    
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)       
        x = x.view(x.size(0), -1)       
        output = self.out(x)
        return output  

And here is the detail of error that I get:

RuntimeError: Given groups=1, weight of size [16, 3, 5, 5], expected input[16, 84, 84, 3] to have 3 channels, but got 84 channels instead

Your input seems to be permuted and it looks as if you are trying to pass a channels-last tensor to the model.
Use input = input.permute(0, 3, 1, 2).contiguous() to permute the dimensions back to [batch_size, channels, height, width] and pass this tensor to the model. Additionally, use .to(memory_format=torch.channels_last) if you want to use this layout.

Thanks @ptrblck. This leads to another error which I couldn’t solve neither. How can this be resolved?

/tmp/ipykernel_2662432/118969748.py in forward(self, x)
     28         x = self.conv2(x)
     29         x = x.view(x.size(0), -1)
---> 30         output = self.out(x)
     31         return output

RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x14112 and 1568x2)

And here is the code:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()        
        self.conv1 = nn.Sequential(         
            nn.Conv2d(
                in_channels=3,              
                out_channels=16,            
                kernel_size=5,              
                stride=1,                   
                padding=2,                  
            ),                              
            nn.ReLU(),                      
            nn.MaxPool2d(kernel_size=2),    
        )
        self.conv2 = nn.Sequential(         
            nn.Conv2d(16, 32, 5, 1, 2),     
            nn.ReLU(),                      
            nn.MaxPool2d(2),                
        )        
        self.out = nn.Linear(32 * 7 * 7, 2)    
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)       
        x = x.view(x.size(0), -1)       
        output = self.out(x)
        return output  ```

```for epoch in range(2):  

    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
                
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs = inputs.permute(0, 3, 1, 2).contiguous()
        
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs.float()).to(memory_format=torch.channels_last)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() ```

The error is raised in self.out(x) as the linear layer expects 32 * 7 * 7 = 1568 input features while the input activation has 14112 features. Use:

self.out = nn.Linear(14112, 2)

and it should work.

Thanks @ptrblck. But after training, when I want to test the trained model, the same error (mentioned below) happens. I can’t change the dimension again for testing. Is there any solution for this?

----> 1 outputs = net(images.permute(0, 3, 1, 2).contiguous())
RuntimeError: mat1 and mat2 shapes cannot be multiplied (16x38400 and 14112x2)
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()        
        self.conv1 = nn.Sequential(         
            nn.Conv2d(
                in_channels=3,              
                out_channels=16,            
                kernel_size=5,              
                stride=1,                   
                padding=2,                  
            ),                              
            nn.ReLU(),                      
            nn.MaxPool2d(kernel_size=2),    
        )
        self.conv2 = nn.Sequential(         
            nn.Conv2d(16, 32, 5, 1, 2),     
            nn.ReLU(),                      
            nn.MaxPool2d(2),                
        )        
        self.out = nn.Linear(14112, 2)    
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)       
        x = x.view(x.size(0), -1)       
        output = self.out(x)
        return output 
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
                
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        inputs = inputs.permute(0, 3, 1, 2).contiguous()
        
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        labels = np.argmax(labels,axis=1)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    print(running_loss)

dataiter = iter(test_loader)
images, labels = dataiter.next()

outputs = net(images.permute(0, 3, 1, 2).contiguous())

If seems that your testing samples have a different size compared to the training samples.
You could either reshape them all to the same size or use e.g. an adaptive pooling layer before the linear layer and define the desired output activation shape.