RuntimeError: shape '[1, 573, 447, 1]' is invalid for input of size 768393

I’m trying to train a fully convolutional network with input images that all have a different size.

Model FCN8s

        super(FCN8s, self).__init__()
        # conv1
        self.conv1_1 = nn.Conv2d(3, 64, 3, padding=100)
        self.relu1_1 = nn.ReLU(inplace=True)
        self.conv1_2 = nn.Conv2d(64, 64, 3, padding=1)
        self.relu1_2 = nn.ReLU(inplace=True)
        self.pool1 = nn.MaxPool2d(2, stride=2, ceil_mode=True)  # 1/2

        # conv2
        self.conv2_1 = nn.Conv2d(64, 128, 3, padding=1)
        self.relu2_1 = nn.ReLU(inplace=True)
        self.conv2_2 = nn.Conv2d(128, 128, 3, padding=1)
        self.relu2_2 = nn.ReLU(inplace=True)
        self.pool2 = nn.MaxPool2d(2, stride=2, ceil_mode=True)  # 1/4

        # conv3
        self.conv3_1 = nn.Conv2d(128, 256, 3, padding=1)
        self.relu3_1 = nn.ReLU(inplace=True)
        self.conv3_2 = nn.Conv2d(256, 256, 3, padding=1)
        self.relu3_2 = nn.ReLU(inplace=True)
        self.conv3_3 = nn.Conv2d(256, 256, 3, padding=1)
        self.relu3_3 = nn.ReLU(inplace=True)
        self.pool3 = nn.MaxPool2d(2, stride=2, ceil_mode=True)  # 1/8

        # conv4
        self.conv4_1 = nn.Conv2d(256, 512, 3, padding=1)
        self.relu4_1 = nn.ReLU(inplace=True)
        self.conv4_2 = nn.Conv2d(512, 512, 3, padding=1)
        self.relu4_2 = nn.ReLU(inplace=True)
        self.conv4_3 = nn.Conv2d(512, 512, 3, padding=1)
        self.relu4_3 = nn.ReLU(inplace=True)
        self.pool4 = nn.MaxPool2d(2, stride=2, ceil_mode=True)  # 1/16

        # conv5
        self.conv5_1 = nn.Conv2d(512, 512, 3, padding=1)
        self.relu5_1 = nn.ReLU(inplace=True)
        self.conv5_2 = nn.Conv2d(512, 512, 3, padding=1)
        self.relu5_2 = nn.ReLU(inplace=True)
        self.conv5_3 = nn.Conv2d(512, 512, 3, padding=1)
        self.relu5_3 = nn.ReLU(inplace=True)
        self.pool5 = nn.MaxPool2d(2, stride=2, ceil_mode=True)  # 1/32

        # fc6
        self.fc6 = nn.Conv2d(512, 4096, 7)
        self.relu6 = nn.ReLU(inplace=True)
        self.drop6 = nn.Dropout2d()

        # fc7
        self.fc7 = nn.Conv2d(4096, 4096, 1)
        self.relu7 = nn.ReLU(inplace=True)
        self.drop7 = nn.Dropout2d()

        self.score_fr = nn.Conv2d(4096, n_class, 1)
        self.score_pool3 = nn.Conv2d(256, n_class, 1)
        self.score_pool4 = nn.Conv2d(512, n_class, 1)

        self.upscore2 = nn.ConvTranspose2d(
            n_class, n_class, 4, stride=2, bias=False)
        self.upscore8 = nn.ConvTranspose2d(
            n_class, n_class, 16, stride=8, bias=False)
        self.upscore_pool4 = nn.ConvTranspose2d(
            n_class, n_class, 4, stride=2, bias=False)

I get the following error, and don’t really understand how to fix it:

RuntimeError: shape ‘[1, 573, 447, 1]’ is invalid for input of size 768393

This is on line 32 of trainer.py file hosted on GitHub: torchfcn

log_p = log_p[target.view(n, h, w, 1).repeat(1, 1, 1, c) >= 0]

The failing operation tries to reshape the target to target.view(n, h, w, 1), where n, h, w are coming from the input: n, c, h, w = input.size().
The expected shapes are given as:

# input: (n, c, h, w), target: (n, h, w)

which doesn’t seem to be the case for your work flow.
Could you check the model output shape as well as the target shape and make sure that the batch dimension as well as the spatial dimensions are equal?

Thanks for your reply! Hopefully this provides enough information.

The input and target:

train_loader = torch.utils.data.DataLoader(train_data, batch_size=1, shuffle=True, **kwargs)
for data, target in train_loader: break
print(data.shape)
print(target.shape)

Print result:

torch.Size([1, 3, 539, 456])
torch.Size([1, 539, 456, 3])

The output shape of each layer (I used the following code line):

summary(model, (3, 539, 456))):

[/root/data/models/pytorch/vgg16_from_caffe.pth] Checking md5 (aa75b158f4181e7f6230029eb96c1b13)
Train:   0%|                                            | 0/250 [00:00<?, ?it/s]----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 745, 478]           1,792
              ReLU-2         [-1, 64, 745, 478]               0
            Conv2d-3         [-1, 64, 745, 478]          36,928
              ReLU-4         [-1, 64, 745, 478]               0
         MaxPool2d-5         [-1, 64, 373, 239]               0
            Conv2d-6        [-1, 128, 373, 239]          73,856
              ReLU-7        [-1, 128, 373, 239]               0
            Conv2d-8        [-1, 128, 373, 239]         147,584
              ReLU-9        [-1, 128, 373, 239]               0
        MaxPool2d-10        [-1, 128, 187, 120]               0
           Conv2d-11        [-1, 256, 187, 120]         295,168
             ReLU-12        [-1, 256, 187, 120]               0
           Conv2d-13        [-1, 256, 187, 120]         590,080
             ReLU-14        [-1, 256, 187, 120]               0
           Conv2d-15        [-1, 256, 187, 120]         590,080
             ReLU-16        [-1, 256, 187, 120]               0
        MaxPool2d-17          [-1, 256, 94, 60]               0
           Conv2d-18          [-1, 512, 94, 60]       1,180,160
             ReLU-19          [-1, 512, 94, 60]               0
           Conv2d-20          [-1, 512, 94, 60]       2,359,808
             ReLU-21          [-1, 512, 94, 60]               0
           Conv2d-22          [-1, 512, 94, 60]       2,359,808
             ReLU-23          [-1, 512, 94, 60]               0
        MaxPool2d-24          [-1, 512, 47, 30]               0
           Conv2d-25          [-1, 512, 47, 30]       2,359,808
             ReLU-26          [-1, 512, 47, 30]               0
           Conv2d-27          [-1, 512, 47, 30]       2,359,808
             ReLU-28          [-1, 512, 47, 30]               0
           Conv2d-29          [-1, 512, 47, 30]       2,359,808
             ReLU-30          [-1, 512, 47, 30]               0
        MaxPool2d-31          [-1, 512, 24, 15]               0
           Conv2d-32          [-1, 4096, 18, 9]     102,764,544
             ReLU-33          [-1, 4096, 18, 9]               0
        Dropout2d-34          [-1, 4096, 18, 9]               0
           Conv2d-35          [-1, 4096, 18, 9]      16,781,312
             ReLU-36          [-1, 4096, 18, 9]               0
        Dropout2d-37          [-1, 4096, 18, 9]               0
           Conv2d-38             [-1, 8, 18, 9]          32,776
  ConvTranspose2d-39            [-1, 8, 38, 20]           1,024
           Conv2d-40            [-1, 8, 47, 30]           4,104
  ConvTranspose2d-41            [-1, 8, 78, 42]           1,024
           Conv2d-42            [-1, 8, 94, 60]           2,056
  ConvTranspose2d-43          [-1, 8, 632, 344]          16,384
================================================================
Total params: 134,317,912
Trainable params: 134,317,912
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 1.75
Forward/backward pass size (MB): 1599.66
Params size (MB): 512.38
Estimated Total Size (MB): 2113.80
----------------------------------------------------------------

@ptrblck I still havent figured this out… I’m using a batch size of 1 because of the varying input image sizes.

The output of your model seems to have the shape [-1, 8, 632, 344] based on your model summary.
However, neither the printed target shape not the error message fit this shape, so I’m unsure how the model is supposed to be used.

The script assumes that the model output and target shape match, while the currently posted shapes don’t really match.

Also, your target seems to use 3 channels in a channels-last format.
What does dim3 represent?