Inconsistent tensor sizes at the last layer of SegNet

Hi , I am using SegNet for image segmentation.

But , I am getting error :

Iter:1
x torch.Size([1, 3, 256, 256])
conv1 torch.Size([1, 64, 256, 256])
layer1 torch.Size([1, 256, 256, 256])
layer2 torch.Size([1, 512, 256, 256])
layer3 torch.Size([1, 1024, 256, 256])
layer4 torch.Size([1, 2048, 256, 256])
/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:180: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.
  warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-75-a24d68a5b61a> in <module>()
      8         targets = Variable(labels)
      9 
---> 10         outputs = model(inputs)
     11         optimizer.zero_grad()
     12         print("outputs size ==> ",outputs.size())

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

<ipython-input-7-8f80fe0445f5> in forward(self, x)
     60             self.layer5c(x),
     61             self.layer5d(x),
---> 62         ], 1))
     63 
     64         print('final', x.size())

RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/TH/generic/THTensorMath.c:2864

This is my network code :

class PSPNet(nn.Module):

    def __init__(self, num_classes):
        #super(PSPNet,self).__init__()
        super().__init__()

        resnet = models.resnet101(pretrained=True)
        
        self.conv1 = resnet.conv1
        self.layer1 = resnet.layer1
        self.layer2 = resnet.layer2
        self.layer3 = resnet.layer3
        self.layer4 = resnet.layer4
        
        
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                m.stride = 1
                m.requires_grad = False
            if isinstance(m, nn.BatchNorm2d):
                m.requires_grad = False
        
        
        self.layer5a = PSPDec(2048, 512, 60)
        self.layer5b = PSPDec(2048, 512, 30)
        self.layer5c = PSPDec(2048, 512, 20)
        self.layer5d = PSPDec(2048, 512, 10)
        
        self.final = nn.Sequential(
            nn.Conv2d(2048, 512, 3, padding=1, bias=False),
            nn.BatchNorm2d(512, momentum=.95),
            nn.ReLU(inplace=True),
            nn.Dropout(.1),
            nn.Conv2d(512, num_classes, 1),
        )

    def forward(self, x):
        
        print('x', x.size())
        
        x = self.conv1(x)
        print('conv1', x.size())
        
        x = self.layer1(x)
        print('layer1', x.size())
        
        x = self.layer2(x)
        print('layer2', x.size())
        
        x = self.layer3(x)
        print('layer3', x.size())
        
        x = self.layer4(x)
        print('layer4', x.size())
        
        x = self.final(torch.cat([
            x,
            self.layer5a(x),
            self.layer5b(x),
            self.layer5c(x),
            self.layer5d(x),
        ], 1))
        
        print('final', x.size())

        return F.upsample_bilinear(final, x.size()[2:])

And here I am running it :


for epoch in range(1, num_epochs+1):
    epoch_loss = []
    iteration=1
    for step, (images, labels) in enumerate(trainLoader):
        print("Iter:"+str(iteration))
        iteration=iteration+1
        inputs = Variable(images)
        targets = Variable(labels)
        
        outputs = model(inputs)
        optimizer.zero_grad()
        print("outputs size ==> ",outputs.size())
        print("targets[:, 0] size ==> ",targets[:, 0].size())

 
        loss = criterion(outputs, targets[:, 0])
        loss.backward()
        optimizer.step()
        epoch_loss.append(loss.data[0])
        
        average = sum(epoch_loss) / len(epoch_loss)
        
        print("loss: "+str(average)+" epoch: "+str(epoch)+", step: "+str(step))

The error is coming from the argument 1 of

x = self.final(torch.cat([
            x,
            self.layer5a(x),
            self.layer5b(x),
            self.layer5c(x),
            self.layer5d(x),
        ], **1**))

Can anyone please help me to solve this problem ?
Thanks in advance.

Probably padding/shape in the PSPDec?
Note that you don’t print the sizes of the layers being catted.

Best regards

Thomas

Hi @tom,

thanks a lot for your reply.

Here is the code for padding/shape in PSPDec :

class PSPDec(nn.Module):

    def __init__(self, in_features, out_features, downsize, upsize=18):
        super(PSPDec,self).__init__()

        self.features = nn.Sequential(
            nn.AvgPool2d(downsize, stride=downsize),
            nn.Conv2d(in_features, out_features, 1, bias=False),
            nn.BatchNorm2d(out_features,momentum=.95),
            nn.ReLU(inplace=True),
            nn.UpsamplingBilinear2d(upsize)
        )

    def forward(self, x):
        return self.features(x)

Also , I have printed the layers being cated , here they are :

 print('x', x.size())
        
        x = self.conv1(x)
        print('conv1', x.size())
        
        x = self.layer1(x)
        print('layer1', x.size())
        
        x = self.layer2(x)
        print('layer2', x.size())
        
        x = self.layer3(x)
        print('layer3', x.size())
        
        x = self.layer4(x)
        print('layer4', x.size())
        
        t5a = self.layer5a(x)
        print('layer5a',t5a.size())
        
        
        t5b = self.layer5b(x)
        print('layer5b',t5b.size())
        
        t5c = self.layer5c(x)
        print('layer5c',t5c.size())
        
        t5d = self.layer5d(x)
        print('layer5d',t5d.size())
        
        
        x = self.final(torch.cat([
            x,
            t5a,
            t5b,
            t5c,
            t5d,
        ], 1))
        
        print('final', x.size())

        return F.upsample_bilinear(final, x.size()[2:])

Here is the result :


x torch.Size([1, 3, 256, 256])
conv1 torch.Size([1, 64, 256, 256])
layer1 torch.Size([1, 256, 256, 256])
layer2 torch.Size([1, 512, 256, 256])
layer3 torch.Size([1, 1024, 256, 256])
layer4 torch.Size([1, 2048, 256, 256])
/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:180: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.
  warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.")
layer5a torch.Size([1, 512, 18, 18])
layer5b torch.Size([1, 512, 18, 18])
layer5c torch.Size([1, 512, 18, 18])
layer5d torch.Size([1, 512, 18, 18])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-24-7d73f3a481fa> in <module>()
      8         targets = Variable(labels)
      9 
---> 10         outputs = model(inputs)
     11         optimizer.zero_grad()
     12         print("outputs size ==> ",outputs.size())

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

<ipython-input-7-86307f5840a1> in forward(self, x)
     74             t5c,
     75             t5d,
---> 76         ], 1))
     77 
     78         print('final', x.size())

RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/TH/generic/THTensorMath.c:2864

Can you please help me ?

Thanks a lot again.

The x and the t5x don’t have compatible dimensions 256 vs 18 in w and h. Which dimensions did you expect them to have? You could pool x to 18 or use upsize 256 (though that sounds a lot to me compared to 18).

Best regards

Thomas

Hi @tom, sorry for late reply. Thanks a lot for your help.

I downsized x from 256 to 18. But, still having another issue :

This is the error :

Iter:1
x torch.Size([1, 3, 256, 256])
conv1 torch.Size([1, 64, 256, 256])
layer1 torch.Size([1, 256, 256, 256])
layer2 torch.Size([1, 512, 256, 256])
layer3 torch.Size([1, 1024, 256, 256])
layer4 torch.Size([1, 2048, 256, 256])
After downsample x ==>  torch.Size([1, 2048, 18, 18])
/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:180: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.
  warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.")
layer5a torch.Size([1, 512, 18, 18])
layer5b torch.Size([1, 512, 18, 18])
layer5c torch.Size([1, 512, 18, 18])
layer5d torch.Size([1, 512, 18, 18])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-32-7d73f3a481fa> in <module>()
      8         targets = Variable(labels)
      9 
---> 10         outputs = model(inputs)
     11         optimizer.zero_grad()
     12         print("outputs size ==> ",outputs.size())

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

<ipython-input-12-f3975da59387> in forward(self, x)
     85             t5c,
     86             t5d,
---> 87         ], 1))
     88 
     89         print('final', x.size())

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     65     def forward(self, input):
     66         for module in self._modules.values():
---> 67             input = module(input)
     68         return input
     69 

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input)
    275     def forward(self, input):
    276         return F.conv2d(input, self.weight, self.bias, self.stride,
--> 277                         self.padding, self.dilation, self.groups)
    278 
    279 

/opt/anaconda/lib/python3.6/site-packages/torch/nn/functional.py in conv2d(input, weight, bias, stride, padding, dilation, groups)
     88                 _pair(0), groups, torch.backends.cudnn.benchmark,
     89                 torch.backends.cudnn.deterministic, torch.backends.cudnn.enabled)
---> 90     return f(input, weight, bias)
     91 
     92 




 


RuntimeError: Given groups=1, weight[512, 2048, 3, 3], so expected input[1, 4096, 18, 18] to have 2048 channels, but got 4096 channels instead


​

​

Here is my code for final layer :

self.final = nn.Sequential(
            nn.Conv2d(2048, 512, 3, padding=1, bias=False),
            nn.BatchNorm2d(512, momentum=.95),
            nn.ReLU(inplace=True),
            nn.Dropout(.1),
            nn.Conv2d(512, num_classes, 1),
        )

Could you please help ?

Thanks in advance .

This time the channels of the input to final and the layer definition do not match.
final takes 2048 in_channels, while the torch.cat operation concatenates t5x, where each one has 512 channels.
You could try to set the in_channels in final to 4096.

Hi @ptrblck,
First of all , thanks a lot for your help.
I changed the code as you mentioned and that error is gone. :slight_smile:

However , I am getting error in

---> 90 return F.upsample_bilinear(self.final, x.size()[2:])
in this line

The error is :
AttributeError: 'Sequential' object has no attribute 'dim'

Could you please help ?

Thanks in advance.

final is the name of a Sequential Module. I guess you would like to upsample its output?
If so, just pass x to F.upsample_bilinear.

Hi @ptrblck,

Thanks a lot for your help.

I am now passing x to F.upsample_bilinear , which is throwing me this error :

TypeError                                 Traceback (most recent call last)
<ipython-input-25-7d73f3a481fa> in <module>()
      8         targets = Variable(labels)
      9 
---> 10         outputs = model(inputs)
     11         optimizer.zero_grad()
     12         print("outputs size ==> ",outputs.size())

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

<ipython-input-8-a6972b090565> in forward(self, x)
     88         print('final', x.size())
     89 
---> 90         return F.upsample_bilinear(x)
     91         #return F.upsample_bilinear(self.final, x.size()[2:])

/opt/anaconda/lib/python3.6/site-packages/torch/nn/functional.py in upsample_bilinear(input, size, scale_factor)
   1422     # DeprecationWarning is ignored by default
   1423     warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.upsample instead.")
-> 1424     return upsample(input, size, scale_factor, mode='bilinear')
   1425 
   1426 

/opt/anaconda/lib/python3.6/site-packages/torch/nn/functional.py in upsample(input, size, scale_factor, mode)
   1373         raise NotImplementedError("Got 4D input, but linear mode needs 3D input")
   1374     elif input.dim() == 4 and mode == 'bilinear':
-> 1375         return _functions.thnn.UpsamplingBilinear2d.apply(input, _pair(size), scale_factor)
   1376     elif input.dim() == 4 and mode == 'trilinear':
   1377         raise NotImplementedError("Got 4D input, but trilinear mode needs 5D input")

/opt/anaconda/lib/python3.6/site-packages/torch/nn/_functions/thnn/upsampling.py in forward(ctx, input, size, scale_factor)
    275             output,
    276             ctx.output_size[0],
--> 277             ctx.output_size[1],
    278         )
    279         return output

TypeError: FloatSpatialUpSamplingBilinear_updateOutput received an invalid combination of arguments - got (int, torch.FloatTensor, torch.FloatTensor, NoneType, NoneType), but expected (int state, torch.FloatTensor input, torch.FloatTensor output, int outputHeight, int outputWidth)

You didn’t specify the size argument. The error points to return F.upsample_bilinear(x).
Try something like F.upsample_bilinear(x, [20, 20]).
I’m not sure, what your first try is supposed to do, since you are passing the same size:

F.upsample_bilinear(x, x.size()[2:])

Hi @ptrblck,

Thanks a lot for your help. I understood my mistake and corrected it
I just did :

return F.upsample_bilinear(x,[256,256])
But , now I am getting error in evaluation of loss function.

Iter:1
x torch.Size([1, 3, 256, 256])
conv1 torch.Size([1, 64, 256, 256])
layer1 torch.Size([1, 256, 256, 256])
layer2 torch.Size([1, 512, 256, 256])
layer3 torch.Size([1, 1024, 256, 256])
layer4 torch.Size([1, 2048, 256, 256])
After downsample x ==>  torch.Size([1, 2048, 18, 18])
/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:180: UserWarning: nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.
  warnings.warn("nn.UpsamplingBilinear2d is deprecated. Use nn.Upsample instead.")
layer5a torch.Size([1, 512, 18, 18])
layer5b torch.Size([1, 512, 18, 18])
layer5c torch.Size([1, 512, 18, 18])
layer5d torch.Size([1, 512, 18, 18])
final torch.Size([1, 2, 18, 18])
/opt/anaconda/lib/python3.6/site-packages/torch/nn/functional.py:1423: UserWarning: nn.functional.upsample_bilinear is deprecated. Use nn.functional.upsample instead.
  warnings.warn("nn.functional.upsample_bilinear is deprecated. Use nn.functional.upsample instead.")
outputs size ==>  torch.Size([1, 2, 256, 256])
targets[:, 0] size ==>  torch.Size([1, 256, 256])
/opt/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:13: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument.
  del sys.path[0]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-22-7d73f3a481fa> in <module>()
     14 
     15 
---> 16         loss = criterion(outputs, targets[:, 0])
     17         loss.backward()
     18         optimizer.step()

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

<ipython-input-1-62b1bfa509f1> in forward(self, outputs, targets)
     11 
     12     def forward(self, outputs, targets):
---> 13         return self.loss(F.log_softmax(outputs), targets)

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    323         for hook in self._forward_pre_hooks.values():
    324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
    326         for hook in self._forward_hooks.values():
    327             hook_result = hook(self, input, result)

/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
    145         _assert_no_grad(target)
    146         return F.nll_loss(input, target, self.weight, self.size_average,
--> 147                           self.ignore_index, self.reduce)
    148 
    149 

/opt/anaconda/lib/python3.6/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce)
   1049         return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
   1050     elif dim == 4:
-> 1051         return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)
   1052     else:
   1053         raise ValueError('Expected 2 or 4 dimensions (got {})'.format(dim))

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at /opt/conda/conda-bld/pytorch_1513368888240/work/torch/lib/THNN/generic/SpatialClassNLLCriterion.c:111

I am taking number of classes as 2. So , class_values should be 0 and 1

Does it mean output is generating some different value ?

If that is the case , how can I find / fix it ?
Can I use Sigmoid here to squash the output value in the range from 0 to 1 ?

Thanks in advance.

Could you print the shape of outputs and targets?
outputs should have the dimensions [batch_size, n_classes, w, h]
while targets this one [batch_size, w, h].
You are indexing it with targets[:, 0], so maybe this yields to the error.

I got the root cause of the problem.

my target tensor is containing values like 15 , 40 etc.

x,y=train_dataset.getitem(0)

y contains values like 15,40 etc.

So , can you please help me in converting image tensor to 0,1 ?

Thanks in advance.

The more important question is, why does your target contain these values?
If your goal is a segmentation, each target values should represent a class.
Could you explain a bit more about the target, i.e. how did you load it etc.?

Hi @ptrblck ,

thanks a lot for your reply.

Here is my dataloader code :

import os
import collections
import torch
import torchvision
import numpy as np
import scipy.misc as m
import matplotlib.pyplot as plt
import yaml
from PIL import Image
from torch.utils import data


class ImageLoader(data.Dataset):
    
    
    def make_dataset(self,dir, set):
        images = []
        if set == 'train':
            fname = os.path.join(dir, 'train_test_split.yaml')
        elif set == 'test':
            fname = os.path.join(dir, 'train_test_split.yaml')

        # read the content of the file
        with open(fname,'r') as f:
            doc = yaml.load(f)

        imagesNum = doc[set]
        imageFolderPath = dir + os.sep + 'images' + os.sep
        file_list = []
        
        for x in imagesNum:
            item = imageFolderPath + os.sep + str(x)+'_image.png'

            width, height = Image.open(open(item,'rb')).size
            #print("Width ==> ",width," Height ==> ",height)
            
            file_list.append(item)
        
        self.files[set]=file_list
            
        return
    
    def __init__(self, root, split="train", img_size=None,input_transform=None, target_transform=None):
        self.root = root
        self.split = split
        self.img_size = [1296, 966]
        
        
        self.n_classes = 2
        self.files = collections.defaultdict(list)
        
        self.input_transform = input_transform
        self.target_transform = target_transform
        
        self.make_dataset(root,split)

    def __len__(self):
        return len(self.files[self.split])

    def __getitem__(self, index):
        img_name = self.files[self.split][index]
                
        img_path = img_name
        img_name = os.path.split(img_path)[1]
        #print("Img name ==> ",img_name)
        img_num = img_name.split("_")[0]
        #print("Img num ==> ",img_num)
        
        lbl_path = self.root + os.sep + 'annotations' + os.sep + img_num +"_" + "annotation.png"

        with open(img_path, 'rb') as f:
            image = Image.open(f).convert('RGB')
            
        
        with open(lbl_path, 'rb') as f:
            label = Image.open(f).convert('P')
            
        if self.input_transform is not None:
            image = self.input_transform(image)
        if self.target_transform is not None:
            label = self.target_transform(label) 

        return image, label

and I am transforming the images before passing them to model :


import numpy as np
import torch

from PIL import Image

def colormap(n):
    cmap=np.zeros([n, 3]).astype(np.uint8)

    for i in np.arange(n):
        r, g, b = np.zeros(3)

        for j in np.arange(8):
            r = r + (1<<(7-j))*((i&(1<<(3*j))) >> (3*j))
            g = g + (1<<(7-j))*((i&(1<<(3*j+1))) >> (3*j+1))
            b = b + (1<<(7-j))*((i&(1<<(3*j+2))) >> (3*j+2))

        cmap[i,:] = np.array([r, g, b])

    return cmap

class Relabel:

    def __init__(self, olabel, nlabel):
        self.olabel = olabel
        self.nlabel = nlabel

    def __call__(self, tensor):
        assert isinstance(tensor, torch.LongTensor), 'tensor needs to be LongTensor'
        tensor[tensor == self.olabel] = self.nlabel
        return tensor


class ToLabel:

    def __call__(self, image):
        return torch.from_numpy(np.array(image)).long().unsqueeze(0)


class Colorize:

    def __init__(self, n=22):
        self.cmap = colormap(256)
        self.cmap[n] = self.cmap[-1]
        self.cmap = torch.from_numpy(self.cmap[:n])

    def __call__(self, gray_image):
        size = gray_image.size()
        color_image = torch.ByteTensor(3, size[1], size[2]).fill_(0)

        for label in range(1, len(self.cmap)):
            mask = gray_image[0] == label

            color_image[0][mask] = self.cmap[label][0]
            color_image[1][mask] = self.cmap[label][1]
            color_image[2][mask] = self.cmap[label][2]

        return color_image

and in the following way I am calling this transformation functions

color_transform = Colorize()
image_transform = ToPILImage()

input_transform = Compose([
    CenterCrop(256),
    ToTensor(),
    Normalize([.485, .456, .406], [.229, .224, .225]),
])
target_transform = Compose([
    CenterCrop(256),
    ToLabel(),
    Relabel(255, 2),
])

I am suspecting the erroneous labelling is occurring due to target_transform. But , I am not able to figure out exact place .

Could you please help ?

Thanks a lot in advance

The Relabel class should set all pixel values of 255 to 2?
If so, change it to Relabel(255, 1), since you have two classes.

However, this does not solve the issue of having target values of 15 and 40.
Could you add the following line after loading the label:

print(np.unique(np.array(label)))

I can not thank you enough @ptrblck

I will make all these changes and let you know.

=========================

Update
Hi @ptrblck ,
This is how I modified the code

target_transform = Compose([
    CenterCrop(256),
    #Resize(136),
    ToLabel(),
    Relabel(255, 1),
])

and I added a print statement here to print labels :


for epoch in range(1, num_epochs+1):
    epoch_loss = []
    iteration=1
    for step, (images, labels) in enumerate(trainLoader):
        print("Iter:"+str(iteration))
        print("Labels ==> ",np.unique(np.array(labels)))
        iteration=iteration+1
        inputs = Variable(images)
        targets = Variable(labels)
        
        outputs = model(inputs)
        optimizer.zero_grad()
        print("outputs size ==> ",outputs.size())
        
        print("outputs[:,0] size ==> ",outputs[:,0].size())
        print("targets[:, 0] size ==> ",targets[:, 0].size())


        #loss = criterion(outputs, targets[:, 0])
        
        #print(targets[:,0])
        loss = criterion(outputs,targets[:,0])
        
        loss.backward()
        optimizer.step()
        epoch_loss.append(loss.data[0])
        
        average = sum(epoch_loss) / len(epoch_loss)
        
        print("loss: "+str(average)+" epoch: "+str(epoch)+", step: "+str(step))

And this is the output I am getting ,

Iter:1
Labels ==>  [ 0 40]
x torch.Size([1, 3, 256, 256])
conv1 torch.Size([1, 64, 256, 256])
layer1 torch.Size([1, 256, 256, 256])
layer2 torch.Size([1, 512, 256, 256])
layer3 torch.Size([1, 1024, 256, 256])

Could you please help ?

Could you post the image source or upload one target image?
Somehow you have two values in the first target image: [0, 40].
You could try to call Relable(40, 1), but maybe the next image has a different value, so that’s not the solution.

If it’s possible, I would like to load one image and see, what kind of values are stored there.

Also, the output does not match the code exactly. I’m still wondering about the shape of targets.

Thanks for the image.
The target image still has 20 different target values.
Using a histogram and a quick visual inspection of the array, it seems that the only valid values should be:

  • 0 for background
  • 15 for class0
  • 40 for class1

All other values seem to be created by some interpolation technique.
Did you resampled/interpolated the target image using something like bilinear interpolation?
If so, you should use the nearest interpolation technique to avoid undefined classes.

However, you could just remove the interp values and go with the images you have now.
Try the following code to clean your targets:

im_arr = np.array(label)
# Remove interp values
im_arr[(im_arr!=0) & (im_arr!=15) & (im_arr!=40)] = 0
# Set to classes
im_arr[im_arr==15] = 1
im_arr[im_arr==40] = 2
print(np.unique(im_arr))

Using this code you will get small borders, where the interpolation was previously.
If you have the raw data, you could re-create the dataset with the nearest interpolation, if not, this could be good enough.

Also, change the number of classes to 3, since you have a background class and two plant classes. :wink:

1 Like