[Help] Debugging ResNet on CIFAR10

Hello guys,

I follow this tutorial:
http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html#convnet-as-fixed-feature-extractor

to apply resnet on CIFAR10. But when I ran my model, I got an error:

The model stop right at AvgPool layer

The error came from backend engine so I could not figure out why it happened. Could you guys help me out?

Thanks
P/s: I change to resnet50 and change the num_classes to 10 in the last fc layer.

The input to the network is expected to be in a BCHW form, i.e. a 4-dimensional Tensor, where the first dimension is the batch dimension, the second dimension is the number of image channels (3 for color, 1 for grayscale), the third dimension is the image height, and the fourth dimension is the image width.

Your input is 2048x1x1 according to your error message. So pytorch thinks the last two dimensions are height and width, i.e. that you have a 1 pixel image. And if you try to do 2x2 pooling on a single pixel, you get the error you see (you need at least 4 pixels in a 2x2 grid).

I suspect you have an error in the way you transform images into your input tensor. Are you using torchvision.datasets?

Stephen

Hi Stephen,

Actually, my original input is batch_size x channels x width x height
I use torchvision.datasets. This is how I transform it

transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])
trainset = torchvision.datasets.CIFAR10(root=args.datadir, train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=args.batch_size, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root=args.datadir, train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=args.batch_size, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

How big are Cifar10 images? I think they’re only 32x32, right? It’s possible that you are using a deep network that is “too deep” for these images, because it is trying to do too much pooling / down-sampling.

Given the error you saw, I would double check that (1) Your input tensors really are BCHW and (2) Your input tensors have enough height and width to survive through all the downsampling in your network

Hi Stephen,

I think you are right. CIFAR10 iamges have dim: 32x32. Your argument is reasonable.
So I will try to remove AvgPool layer so that at this point the input of the last fc layer is 2048x0x0.

Thank you a lot

Hi Stephen

I tried to remove AvgPool, and it worked. Thank you a lot.

Hi,
how did you remove AvgPool? Coul you write your code to remove the layer?
Thanks!

Hi, can you reach ~93% acc on test set after removing the avgpool layer?