RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 170 and 171 in dimension 3 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:111

I am receiving this error

File "/erfnet_pytorch-master/train/erfnet.py", line 20, in forward
 output = torch.cat([self.conv(input), self.pool(input)], 1)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 170 and 171 in dimension 3 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:111

And this is the part of the code which is giving the error

class DownsamplerBlock (nn.Module):
    def __init__(self, ninput, noutput):
        super(DownsamplerBlock, self).__init__()

        self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)
        self.pool = nn.MaxPool2d(2, stride=2)
        self.bn = nn.BatchNorm2d(noutput, eps=1e-3)

    def forward(self, input):
        print([self.conv(input).size(), self.pool(input).size()])
        output = torch.cat([self.conv(input), self.pool(input)], 1)
        output = self.bn(output)
        return F.relu(output)

The output of the print line is:

[(1, 13, 256, 341), (1, 3, 256, 341)]
[(1, 48, 128, 171), (1, 16, 128, 170)]

I see that there is a mismatch in the last dimension, 170 and 171 … but I don’t know why

Could you try to set ceil_mode=True for your nn.MaxPool2d layer and try it again?

2 Likes

Thank you … It solved the error
But can you explain for me what does this flag/option does ? like why in the beginning I had a mismatch in dimensions, and how this is solved ?

Another thing, I got now a different error


Traceback (most recent call last):
  File "main.py", line 506, in <module>
    main(parser.parse_args())
  File "main.py", line 460, in main
    model = train(args, model, True) #Train encoder
  File "main.py", line 232, in train
    loss = criterion(outputs, targets[:, 0])
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "main.py", line 82, in forward
    return self.loss(torch.nn.functional.log_softmax(outputs, dim=1), targets)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/modules/loss.py", line 193, in forward
    self.ignore_index, self.reduce)
  File "/home/.local/lib/python2.7/site-packages/torch/nn/functional.py", line 1334, in nll_loss
    return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
RuntimeError: input and target batch or spatial sizes don't match: target [5 x 64 x 85], input [5 x 13 x 64 x 86] at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24

The reason is, that your conv layer resamples the input differently than your pooling layer.
While both should halve the input in the spatial dimensions, the pool layer will use the floor operation by default, resulting in floor(341/2)=170 for the width. Setting ceil_mode=True will instead return 171 which matches the conv output.

Your new error seems to be related to the same issue. The target is a column smaller than the output.
Usually it’s simpler to use base2 sizes, but if that’s not possible in your use case, you would have to check, which operation creates the size mismatch.

the targets simply are the ground truth labels, and the input, is the output of the model

This is how the Encoder is working

class Encoder(nn.Module):
    def __init__(self, num_classes):
        super(Encoder, self).__init__()
        self.initial_block = DownsamplerBlock(3,16)

        self.layers = nn.ModuleList()

        self.layers.append(DownsamplerBlock(16,64))

        for x in range(0, 5):    #5 times
           self.layers.append(non_bottleneck_1d(64, 0.03, 1)) 

        self.layers.append(DownsamplerBlock(64,128))

        for x in range(0, 2):    #2 times
            self.layers.append(non_bottleneck_1d(128, 0.3, 2))
            self.layers.append(non_bottleneck_1d(128, 0.3, 4))
            self.layers.append(non_bottleneck_1d(128, 0.3, 8))
            self.layers.append(non_bottleneck_1d(128, 0.3, 16))

        #Only in encoder mode:
        self.output_conv = nn.Conv2d(128, num_classes, 1, stride=1, padding=0, bias=True)

    def forward(self, input, predict=False):
        output = self.initial_block(input)
        print(output.size())
        for layer in self.layers:
            output = layer(output)
            print(output.size())
        exit()
        if predict:
            output = self.output_conv(output)

        return output

These are the outputs after each layer:

(1, 16, 256, 341)
(1, 64, 128, 171)
(1, 64, 128, 171)
(1, 64, 128, 171)
(1, 64, 128, 171)
(1, 64, 128, 171)
(1, 64, 128, 171)
(1, 128, 64, 86)
(1, 128, 64, 86)
(1, 128, 64, 86)
(1, 128, 64, 86)
(1, 128, 64, 86)
(1, 128, 64, 86)
(1, 128, 64, 86)
(1, 128, 64, 86)
(1, 128, 64, 86)
(1, 13, 64, 86)

And the whole downsampler:

class DownsamplerBlock (nn.Module):
    def __init__(self, ninput, noutput):
        super(DownsamplerBlock, self).__init__()

        self.conv = nn.Conv2d(ninput, noutput-ninput, (3, 3), stride=2, padding=1, bias=True)
        self.pool = nn.MaxPool2d(2, stride=2, ceil_mode=True)
        self.bn = nn.BatchNorm2d(noutput, eps=1e-3)

    def forward(self, input):
        #transforms.RandomCrop(224)
        output = torch.cat([self.conv(input), self.pool(input)], 1)
        output = self.bn(output)
        return F.relu(output)
    

class non_bottleneck_1d (nn.Module):
    def __init__(self, chann, dropprob, dilated):        
        super(non_bottleneck_1d, self).__init__()

        self.conv3x1_1 = nn.Conv2d(chann, chann, (3, 1), stride=1, padding=(1,0), bias=True)

        self.conv1x3_1 = nn.Conv2d(chann, chann, (1,3), stride=1, padding=(0,1), bias=True)

        self.bn1 = nn.BatchNorm2d(chann, eps=1e-03)

        self.conv3x1_2 = nn.Conv2d(chann, chann, (3, 1), stride=1, padding=(1*dilated,0), bias=True, dilation = (dilated,1))

        self.conv1x3_2 = nn.Conv2d(chann, chann, (1,3), stride=1, padding=(0,1*dilated), bias=True, dilation = (1, dilated))

        self.bn2 = nn.BatchNorm2d(chann, eps=1e-03)

        self.dropout = nn.Dropout2d(dropprob)
        

    def forward(self, input):

        output = self.conv3x1_1(input)
        output = F.relu(output)
        output = self.conv1x3_1(output)
        output = self.bn1(output)
        output = F.relu(output)

        output = self.conv3x1_2(output)
        output = F.relu(output)
        output = self.conv1x3_2(output)
        output = self.bn2(output)

        if (self.dropout.p != 0):
            output = self.dropout(output)
        
        return F.relu(output+input)    #+input = identity (residual connection)

I receive the error upon the execution of this line
loss = criterion(outputs, targets[:, 0])
where the original targets size is (1, 1, 64, 85)

How did you resize the target?
Was is originally in that size? If so, did you upsample your input image?

target = Resize(int(self.height/8), Image.NEAREST)(target)

class MyCoTransform(object):
    def __init__(self, enc, augment=True, height=512):
        self.enc=enc
        self.augment = augment
        self.height = height
        pass
    def __call__(self, input, target):
        # do something to both images
        input =  Resize(self.height, Image.BILINEAR)(input)
        target = Resize(self.height, Image.NEAREST)(target)

        if(self.augment):
            # Random hflip
            hflip = random.random()
            if (hflip < 0.5):
                input = input.transpose(Image.FLIP_LEFT_RIGHT)
                target = target.transpose(Image.FLIP_LEFT_RIGHT)
            
            #Random translation 0-2 pixels (fill rest with padding
            transX = random.randint(-2, 2) 
            transY = random.randint(-2, 2)

            input = ImageOps.expand(input, border=(transX,transY,0,0), fill=0)
            target = ImageOps.expand(target, border=(transX,transY,0,0), fill=255) #pad label filling with 255
            input = input.crop((0, 0, input.size[0]-transX, input.size[1]-transY))
            target = target.crop((0, 0, target.size[0]-transX, target.size[1]-transY))   

        input = ToTensor()(input)
        if (self.enc):
            target = Resize(int(self.height/8), Image.NEAREST)(target)
        target = ToLabel()(target)
        target = Relabel(255, 19)(target)

        return input, target

Could you resize it to the desired shape?

target = Resize((64, 86), Image.NEAREST)(target)

I received this error


Traceback (most recent call last):
  File "main.py", line 512, in <module>
    main(parser.parse_args())
  File "main.py", line 466, in main
    model = train(args, model, True) #Train encoder
  File "main.py", line 215, in train
    for step, (images, labels) in enumerate(loader):
  File "/home/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 286, in __next__
    return self._process_next_batch(batch)
  File "/home/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
ValueError: Traceback (most recent call last):
  File "/home/.local/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/erfnet_pytorch-master/train/dataset.py", line 95, in __getitem__
    image, label = self.co_transform(image, label)
  File "main.py", line 68, in __call__
    target = Resize(64, 86)(target)
  File "build/bdist.linux-x86_64/egg/torchvision/transforms/transforms.py", line 175, in __call__
    return F.resize(img, self.size, self.interpolation)
  File "build/bdist.linux-x86_64/egg/torchvision/transforms/functional.py", line 204, in resize
    return img.resize((ow, oh), interpolation)
  File "/app/anaconda2/envs/tf-1.2/lib/python2.7/site-packages/PIL/Image.py", line 1695, in resize
    raise ValueError("unknown resampling filter")
ValueError: unknown resampling filter

Did you pass the size as a tuple, i.e. (64, 86)?

Ah okay, I did it now as a tuple and it passed
but I got another error :slight_smile:


main.py:292: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  inputs = Variable(images, volatile=True)    #volatile flag makes it free backward or outputs for eval
main.py:293: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  targets = Variable(labels, volatile=True)
main.py:297: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
  epoch_loss_val.append(loss.data[0])
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f877c13ef10>> ignored
Traceback (most recent call last):
  File "main.py", line 507, in <module>
    main(parser.parse_args())
  File "main.py", line 461, in main
    model = train(args, model, True) #Train encoder
  File "main.py", line 304, in train
    iouEvalVal.addBatch(outputs.max(1)[1].unsqueeze(1).data, targets.data)
  File "/erfnet_pytorch-master/train/iouEval.py", line 61, in addBatch
    tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 2)

However, in this epoch

tpmult = x_onehot * y_onehot    #times prediction and gt coincide is 1
tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True), dim=2, keepdim=True), dim=3, keepdim=True).squeeze()

and x_onehot istensor([ 0.], device='cuda:0'), and y_onehot is tensor([ 1.], device='cuda:0')

Both tensors are 1-dim tensors and your “second” torch.sum tries to sum in dim2.

I changed it to this
tp = torch.sum(torch.sum(torch.sum(tpmult, dim=0, keepdim=True))).squeeze()

and it is not giving errors, but I am not sure if this is a right thing or not, because anyways it is a cloned project which I am working on … so not sure how will this affects the training or the process

hello in this code i dont understand these lines:
target = ToLabel()(target)
target = Relabel(255, 19)(target)
i want to know about these functions performance:
ToLabel()
Relabel

I don’t know how ToLabel is implemented and you could ask @mhusseinsh if this code could be shared so that you can profile it.

1 Like