Trying to build my first image classifier. Getting an error I don't understand

Hello, I’m working on my first real classifier that isn’t just a demo someone else made. I am at the point where I’m starting the first forward pass with my data into the network during the training phase.

To be honest, I’m not quite sure how to determine the values I should use within the various layers of this network. That was part of the incentive of trying to build a classifier myself. My network is currently defined as:


# Convolutional neural network (two convolutional layers)
class ConvNet(nn.Module):
    def __init__(self, num_classes=3):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=16, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(num_features=16),
            nn.ReLU(),
            #nn.MaxPool2d(kernel_size=2, stride=2))
            nn.AdaptiveMaxPool2d(output_size=16))
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(num_features=32),
            nn.ReLU(),
            #nn.MaxPool2d(kernel_size=2, stride=2))
            nn.AdaptiveMaxPool2d(output_size=7*7*32))
        self.fc = nn.Linear(in_features=7*7*32, out_features=num_classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        return out

model = ConvNet(num_classes).to(device)
print("Model defined")

For my first layer, I’m using 3 input channels because my images are all RGB. The other parameter settings were from an example I’m using as reference, but I want to learn how to know what to change them to myself.

The stack trace I’m getting is below is a little too cryptic for me to know where to go from here:

Traceback (most recent call last):
  File "main.py", line 97, in <module>
    outputs = model(images)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "main.py", line 77, in forward
    out = self.fc(out)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 1024, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [2 x 78675968], m2: [1568 x 3] at /Users/soumith/code/builder/wheel/pytorch-src/aten/src/TH/generic/THTensorMath.cpp:2070

Thanks all for any help!


EDIT: After looking at the numbers in the runtime error for a couple hours I realized something.

I’m rescaling my input images to all be 224 x 224, which is a total of 50,176 pixels.

If I divide the first large number in the error message, 78,675,968, by 50,176, it = exactly 1,568, the second large number in the error message. I get the sense that something in my first layer is expecting the input to be of a dimensionality that is close to but not precisely equal to.

Realized it was this code I needed to tinker with. Changed it to:

        nn.AdaptiveMaxPool2d(output_size=6))
    self.fc = nn.Linear(in_features=1152, out_features=3)

And my train and test phases work successfullly! Only 77% accuracy, but I’ll see if I can improve it now.