Output size is too small Runtime Error

I’m trying to make a image classifier with the dataset given in tensorflow for poets. But its throwing some weird error that i can’t wrap my head around.

Traceback (most recent call last):
  File "lit.py", line 68, in <module>
    out = flolit(image)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "lit.py", line 45, in forward
    x = F.relu(self.pool(self.conv4(x)))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 325, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 277, in forward
    self.padding, self.dilation, self.groups)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 90, in conv2d
    return f(input, weight, bias)
RuntimeError: Given input size: (75 x 2 x 5). Calculated output size: (100 x -2 x 1). Output size is too small at /pytorch/torch/lib/THNN/generic/SpatialConvolutionMM.c:45
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.optim as optim
from prepare_data import data_loader
import torch.nn.functional as F
import sys

class Flossifier(nn.Module):
    def __init__(self, numclass):

        self.input_size = 1050

        self.conv1 = nn.Conv2d(3, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 30, kernel_size=5)
        self.conv3 = nn.Conv2d(30, 75, kernel_size=5)
        self.conv4 = nn.Conv2d(75, 100, kernel_size=5)
        # self.conv5 = nn.Conv2d(100, 175, kernel_size=5)
        self.pool = nn.MaxPool2d(2)

        # self.fc = nn.Linear(self.input_size, numclass)

        # Flat neural network
        self.neural_net = nn.Sequential(
        nn.Linear(self.input_size, numclass),
        nn.Linear(70, 45),
        nn.Linear(45, 20),
        nn.Linear(20, 7),
        nn.Linear(7, numclass)

    def forward(self, x):
        in_size = x.size(0)

        x = F.relu(self.pool(self.conv1(x)))
        x = F.relu(self.pool(self.conv2(x)))
        x = F.relu(self.pool(self.conv3(x)))
        x = F.relu(self.pool(self.conv4(x)))
        # x = F.relu(self.pool(self.conv5(x)))

        x = x.view(in_size, -1)

        x = self.neural_net(x)

        return F.log_softmax(x, dim=1)

flolit = Flossifier(5)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(flolit.parameters(), lr=.0001)

for i, (image, label) in enumerate(data_loader):
    image = Variable(image)
    label = Variable(label)

    out = flolit(image)

    loss = criterion(out, label)

    sys.stdout.write('Loss: '+str(loss.data[0])[:5]+'\r')
    # sys.stdout.write('Loss: '+str(net.l1.weight.grad))


Here is my dataloader

from torchvision import transforms
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
from PIL import Image

transform1 = transforms.Compose([
    transforms.Normalize(mean=[0.285, 0.256, 0.206], std=[0.129, 0.124, 0.125])

# img = Image.open('flower_photos/daisy/daisybug.jpg')
# imgTensor = transform(img)
# trans = transforms.ToPILImage()
# img2 = trans(imgTensor).convert('RGB')
# img2.show()

test_data = ImageFolder(root='flower_photos', transform=transform1)

data_loader = DataLoader(test_data, batch_size=10, shuffle=True)

Your activation is just too small for the current model architecture.
Each conv layer uses a kernel size of 5 without padding, i.e. your activation will lose 4 pixel in its width and height.
After each conv you are halving the activation using pooling.

For an input of size [50, 70] you will run into this error, since the sizes would be:

x = F.relu(self.pool(self.conv1(x))) # [23, 33]
x = F.relu(self.pool(self.conv2(x))) # [9, 14]
x = F.relu(self.pool(self.conv3(x))) # [2, 5]
# output is too small for conv4 with kernel_size=5
x = F.relu(self.pool(self.conv4(x)))

Try to use padding=2 to keep the shape or modify your model architecture.

1 Like

i have the same question, but i have used padding=2

Could you post the model architecture so that we can have a look?
Most likely the activations are too small due to pooling or strided convolutions.

1 Like

thank for you help, i have solve the problem.

Thanks, it also helped me