Only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 1, 256, 256]

alex_d · January 20, 2020, 6:30pm

I am not completely sure if this is what you mean by creating the target, but I created masked images with two classes (background and car) like in the examples below.

car_for_mask

masked_car

Note: The images above are the same size, its just showing like different sizes because one of them is a screenshot.

Then I have this class that takes in the path of the original images and masked images:

class MyDataset(Dataset):

    def __init__(self, image_paths, target_paths, train=True):

        self.image_paths = image_paths

        self.target_paths = target_paths

        self.image_dirs = os.listdir(self.image_paths)

        self.target_dirs = os.listdir(self.target_paths)

    def transform(self, image, mask):

        # Resize

        resize = transforms.Resize(size=(768, 1024))

        image = resize(image)

        mask = resize(mask)

        # Random crop

        i, j, h, w = transforms.RandomCrop.get_params(

            image, output_size=(750, 1000))

        image = TF.crop(image, i, j, h, w)

        mask = TF.crop(mask, i, j, h, w)

        # Random horizontal flipping

        if random.random() > 0.5:

            image = TF.hflip(image)

            mask = TF.hflip(mask)

        # Random vertical flipping

        if random.random() > 0.5:

            image = TF.vflip(image)

            mask = TF.vflip(mask)

        # Transform to tensor

        image = TF.to_tensor(image)

        mask = TF.to_tensor(mask)

        #Normalize? Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

        image = TF.normalize(image, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

        return image, mask

    def __getitem__(self, index):


        image = Image.open(self.image_paths + self.image_dirs[index])

        mask = Image.open(self.target_paths + self.target_dirs[index])

        x, y = self.transform(image, mask)

        return x, y

    def __len__(self):

        #return len(self.image_dirs)

        return len(self.image_dirs)

Is that what you mean by creating a target? Or am I missing something.

One thing that I am confused about is the part where you mentioned creating class indices, because besides using the masked images I haven’t created any class indices. I know I asked a similar question in this post and I haven’t responded to your last reply because I was still trying to make sense of it, and was hoping that by running the code I would figure it out but I’m still confused.

One of the reasons I am confused is that since I am using a pre-trained model, wouldn’t there be an existing mapping of the colors already that I could refer to instead of creating my own indices?

I was looking at this tutorial and the part below seems to have a mapping. I don’t know if this is the official color mapping used for the pre-trained resnet_101 segmentation model, but if it is, when fine tuning a model, wouldn’t it be enough to have masked images that follow this color code (in my case (0,0,0) representing background and (128, 128, 128) representing car) for the model to deduce which class it belongs to or would I have to create new class indices anyways?

And if I do have to create new class indices, like you mention in this post in which part of the code should this happen? Like should I have a helper function creating the class indices (mapping each color to a class) and should I include that within the MyDataset class, and at what point should I call that function?

# Define the helper function
def decode_segmap(image, nc=21):
   
    label_colors = np.array([(0, 0, 0),  # 0=background
                 # 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle
                 (128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128),
                 # 6=bus, 7=car, 8=cat, 9=chair, 10=cow
                 (0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0),
                 # 11=dining table, 12=dog, 13=horse, 14=motorbike, 15=person
                 (192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (192, 128, 128),
                 # 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor
                 (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)])
    
    r = np.zeros_like(image).astype(np.uint8)
    g = np.zeros_like(image).astype(np.uint8)
    b = np.zeros_like(image).astype(np.uint8)
     
    for l in range(0, nc):
        idx = image == l
        r[idx] = label_colors[l, 0]
        g[idx] = label_colors[l, 1]
        b[idx] = label_colors[l, 2]
       
    rgb = np.stack([r, g, b], axis=2)
    return rgb