Invalid argument 2: mismatch between the batch size of input (1) and that of target (50) only with certain NN architectures


I’m working on switching over some code I have from Keras to Pytorch, and I’m trying to get a network training on CIFAR-100. So far, I’ve been bogged with issues telling me, in some way or another, that my output and target sizes are mismatched or that the batch size and target size are mismatched. The only architecture that works and trains on CIFAR-100 is the one from the “Training a classifier” tutorial (

I can manipulate the number of filters and the model will still successfully train, but changing the kernel size or adding/taking away layers break it so that it won’t train on CIFAR 100 and instead throws an error.

I’ll attach my code for your perusal. Please inform me of any fixes you know!

import torch
import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR100(root='./data', train=True,
                                        download=True, transform=transform)
trainloader =, batch_size=50,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR100(root='./data', train=False,
                                       download=True, transform=transform)
testloader =, batch_size=50,
                                         shuffle=False, num_workers=2)

classes = (str(i) for i in range(100))

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 60, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(60, 160, 5)
        #self.conv3 = nn.Conv2d(160, 160, 5)
        self.fc1 = nn.Linear(160 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 100)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        #x = F.relu(self.conv3(x))
        x = x.view(-1, 160 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

from torch.autograd import Variable
for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        inputs, labels = Variable(inputs), Variable(labels)
        # zero the parameter gradients

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)

        prediction =[1]
        accuracy = prediction.eq(*100
        if i % 1000 == 0:
          print('Train Step: {}\tLoss: {:.10f}\tAccuracy: {:.10f}'.format(i,[0], accuracy))

print('Finished Training')

I’ve attached code that runs successfully for me on both Ubuntu 16.04 and Google Colab notebooks. If I uncomment the self.conv3 lines in both init and forward, then the code breaks. If I change the kernel size in any of the convolution lines, the code breaks.

Thanks in advance for your help!

Edit: I can also add/take away Linear layers without any ill effects.

You issues are related to the changed activation size, if you manipulate the conv layers.
If you want to keep the output shapes, you would have to set the padding value for your layers, as it’s currently 0 by default.
E.g. for your current setup with kernel_size=5 and stride=1, you would need to add padding=2 to keep the shape.

Thank you! It’s sorted out now!

Appreciate the help.