RuntimeError: Given groups=1, weight[64, 3, 3, 3], so expected input[32, 64, 16, 16] to have 3 channels, but got 64 channels instead

JSH · March 10, 2018, 7:28am

Why am I getting this error?

I wrote an implementation of VGGnet16

class Network(nn.Module):

def __init__(self):
    super(Network, self).__init__()

    self.feature = nn.Sequential(
        nn.Conv2d(3, 64, kernel_size=3, padding=1),
        nn.ReLU(True),
        nn.Conv2d(64, 64, kernel_size=3, padding=1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size=2, stride=2, dilation=1),

        nn.Conv2d(3, 64, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(64, 64, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size = 2, stride = 2, dilation = 1),

        nn.Conv2d(64, 128, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(128, 128, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size = 2, stride = 2, dilation = 1),

        nn.Conv2d(128, 256, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(256, 256, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(256, 256, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size = 2, stride = 2, dilation = 1),

        nn.Conv2d(256, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(2, stride = 2, dilation = 1),

        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size = 2, stride = 2, dilation = 1),
    )

    self.fc = nn.Linear(512, 10)

def forward(self, x):

    x = self.feature(x)
    x = x.view(x.size(0), -1)
    x = self.fc(x)

    return x

And this is the training loop.

for epoch in range(5):

running_loss = 0.0

for i, data in enumerate(trainloader):

    inputs, labels = data
    inputs, labels = Variable(inputs).cuda(), Variable(labels).cuda()

    optimizer.zero_grad() # zeroes the gradient buffers of all parameters

    outputs = net(inputs) # forward

    loss = criterion(outputs, labels) # calculate the loss
    loss.backward() # back propagation
    optimizer.step() # update gradients

    running_loss += loss.data[0]

    if i % 100 == 99:
        print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 100))

        running_loss = 0.0

print(‘Finish Training’)

cakeeatingpolarbear · March 10, 2018, 7:57am

The problem is probably because in your 3rd conv layer is expecting 64 channels, but you’re only giving it 3.


self.feature = nn.Sequential(
        nn.Conv2d(3, 64, kernel_size=3, padding=1),
        nn.ReLU(True),
        nn.Conv2d(64, 64, kernel_size=3, padding=1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size=2, stride=2, dilation=1),

        nn.Conv2d(64, 64, kernel_size = 3, padding = 1), # <-- replace your line with this
        nn.ReLU(True),
        nn.Conv2d(64, 64, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size = 2, stride = 2, dilation = 1),
       ...

JSH · March 10, 2018, 8:38am

Thank you for your answer. The issue is resolved by fixing the point you indicated.

But sorry, I have more questions.

I implemented the training code and vggnet based on the pytorch tutorial, but the loss does not converge.

This is a modified network structure.

class Network(nn.Module):

def __init__(self):
    super(Network, self).__init__()

    self.feature = nn.Sequential(
        nn.Conv2d(3, 64, kernel_size=3, padding=1),
        nn.ReLU(True),
        nn.Conv2d(64, 64, kernel_size=3, padding=1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size=2, stride=2, dilation=1),

        nn.Conv2d(64, 128, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(128, 128, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size = 2, stride = 2, dilation = 1),

        nn.Conv2d(128, 256, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(256, 256, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(256, 256, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size = 2, stride = 2, dilation = 1),

        nn.Conv2d(256, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(2, stride = 2, dilation = 1),

        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.Conv2d(512, 512, kernel_size = 3, padding = 1),
        nn.ReLU(True),
        nn.MaxPool2d(kernel_size = 2, stride = 2, dilation = 1),
    )

    self.fc = nn.Linear(512, 10)

def forward(self, x):

    x = self.feature(x)
    x = x.view(x.size(0), -1)
    x = self.fc(x)

    return x

And this is training code.

for epoch in range(5):

running_loss = 0.0

for i, data in enumerate(trainloader):

    inputs, labels = data
    inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())

    optimizer.zero_grad() # zeroes the gradient buffers of all parameters

    outputs = net(inputs) # forward

    loss = criterion(outputs, labels) # calculate the loss
    loss.backward() # back propagation
    optimizer.step() # update gradients

    running_loss += loss.data[0]

    if i % 100 == 99:
        print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 100))

        running_loss = 0.0

print(‘Finish Training’)

Finally, this is the result of loss.

[1, 100] loss: 2.305
[1, 200] loss: 2.305
[1, 300] loss: 2.304
[2, 100] loss: 2.305
[2, 200] loss: 2.304
[2, 300] loss: 2.304
[3, 100] loss: 2.304
[3, 200] loss: 2.304
[3, 300] loss: 2.304
[4, 100] loss: 2.304
[4, 200] loss: 2.305
[4, 300] loss: 2.304
[5, 100] loss: 2.304
[5, 200] loss: 2.304
[5, 300] loss: 2.305

Is there any problem in training code and network structure?

cakeeatingpolarbear · March 10, 2018, 9:49am

What loss criterion are you using?

JSH · March 10, 2018, 12:24pm

I use CrossEntropyLoss

cakeeatingpolarbear · March 10, 2018, 4:53pm

Maybe try crossentopywithlogits

JSH · March 12, 2018, 6:33am

I have tried using the functions you suggested and the other loss function. But unfortunately, I can’t solve this problem with these loss functions.

From this point of view, there seems to be a problem with the network structure.

Therefore, I will check the network structure once more.

Thank you for your suggestion.