How to convert RGB images with many different colors (not only red, green, blue) into classes for segmentation training?, The mask is linked below

This code shows an example of the transformation from colors to class indices.

2 Likes

Thankyou so much for the reply!, i was also wondering how do i pass these values to nn.CrossEntropy function. Won’t i get an dimensional mismatch?
And also wanted to know after getting the mask from your code, can i directly pass these masks from the get_item() function?

Yes, you should be able to transform the color masks to the mask targets containing indices and return them in the __getitem__.

No, nn.CrossEntropyLoss won’t raise an error, if you pass the model outputs in the shape [batch_size, nb_classes, height, width] and the masks as [batch_size, height, width] for a multi-class segmentation use case.

1 Like

Thanks for the reply again. I’m clear how to pass the arguments to nn.CrossEntropyLoss but im not clear how do i pass the masks and images.
What values should i return in my ‘get__item’. Can you help me out with this, im new to pytorch.

In the linked code actually we have to specify num of classes, i.e in my case number of different colors. I don’t have that information with me. So, is there any other solution?

That’s a rather uncommon use case. How would you define the last output layer, if the number of classes is unknown?
Are you planning to add an “unknown” class category which the model should use for all additional classes?

1 Like

I’m still working on it, I was thinking ‘Number of Unique Colors= Number of classes’.
Or am I wrong? Is there a different way I could train using this dataset.

Yes, usually the number of all unique colors in the dataset would correspond to the number of classes.

As explained before, you could try to add “new/unknown” colors to a special “unknown” class category, but it really depends on your use case and what you are trying to achieve.

1 Like

Thankyou for the reply. Assuming that there are same number of unique colors in every image in my dataset I will move ahead.
But I’m still not clear how can I pass the mask, after using the code.
Can I know exactly what the code will return. For eg: Shape Or Content.

The __getitem__ should return an input tensor in the shape [channels, height, width] and a mask tensor in the shape [height, width] containing class indices in the range [0, nb_classes-1].
The DataLoader will then add the batch dimension to these tensors, such that the input tensor will have a shape of [batch_size, channels, height, width] while the mask [batch_size, height, width].

1 Like

Thankyou for the reply, helped me a lot!. Get back to you if something goes wrong while implementing this.

Hello Sir thankyou for all the help, is there a code to get the number of unique colors in an image

Yes, my linked code snippet gets all unique colors from the image.

[...]
# Get color codes for dataset (maybe you would have to use more than a single
# image, if it doesn't contain all classes)
target = torch.from_numpy(target)
colors = torch.unique(target.view(-1, target.size(2)), dim=0).numpy()
[...]

Ignore the previous error, I’m using the linked code as shown above, Im using your code as below and im getting an error:"The shape of the mask [3, 1000] at index 0 does not match the shape of the indexed tensor [1000, 1000] at index 0"
**What are the values of h and w i should use, i Highly doubt im using the code properly
**

def getitem(self, idx):

    # load images ad masks

    img_path = os.path.join(self.root, "original_images", self.imgs[idx])

    mask_path = os.path.join(self.root, "col", self.masks[idx])

    img = Image.open(img_path).convert("RGB")

    # note that we haven't converted the mask to RGB,

    # because each color corresponds to a different instance

    # with 0 being background

    h, w = 1000, 1000

             

    mask = Image.open(mask_path)

    #mask = np.array(mask)

    # Create mapping

Get color codes for dataset (maybe you would have to use more than a single

image, if it doesn’t contain all classes)

    #target = torch.from_numpy(mask)

    target=self.transform(mask) #converts to tensor, resize to 1000x1000

    colors = torch.unique(target.view(-1, target.size(2)), dim=0).numpy()

    target = target.permute(2, 0, 1).contiguous()

    mapping = {tuple(c): t for c, t in zip(colors.tolist(), range(len(colors)))}

    mask = torch.empty(h, w, dtype=torch.long)

    for k in mapping:

# Get all indices for current class

      idx = (target==torch.tensor(k, dtype=torch.uint8).unsqueeze(1).unsqueeze(2))

      validx = (idx.sum(0) == 3)  # Check that all channels match

      mask[validx] = torch.tensor(mapping[k], dtype=torch.long)
      return self.transform(img), (mask)

Hello sir is there any example code for this? im getting a ““1only batches of spatial targets supported (3D tensors) but got targets of size: : [1, 1000, 1000, 3]”” error

You could compare my code example to yours and try to find the difference or are you also seeing this error using my code snippet?
If you cannot find the issue, feel free to post an executable code snippet which reproduces the error.

This is how i usedyour code:
I dont have that error anymore, but is the proper way to use your code?
“This is the current error”
“ValueError: Expected target size (1, 480), got torch.Size([1, 3, 480])”

“Used nn.CrossEntropyLoss(), num_classes=256(randomly initialized, bcz i was getting the len(colors) as 2992”

class Cus_dataset(torch.utils.data.Dataset):

def __init__(self, root,transform,transformm):

    self.root = root

    self.transform =  transform

    self.transformm = transformm

    # load all image files, sorting them to

    # ensure that they are aligned

    self.imgs = list(sorted(os.listdir(os.path.join(root, "original_images"))))

    self.masks = list(sorted(os.listdir(os.path.join(root, "col"))))

                  

def __getitem__(self, idx):

    # load images ad masks

    img_path = os.path.join(self.root, "original_images", self.imgs[idx])

    mask_path = os.path.join(self.root, "col", self.masks[idx])

    img = Image.open(img_path).convert("RGB")

    # note that we haven't converted the mask to RGB,

    # because each color corresponds to a different instance

    # with 0 being background

                    

    mask = Image.open(mask_path)

    mask = np.array(mask)

    #colors = np.unique(mask)

    #mask =self.transform(mask)

    def mask_to_class(mask):

      target = torch.from_numpy(mask)

      h,w = target.shape[0],target.shape[1]

      masks = torch.empty(h, w, dtype=torch.long)

      colors = torch.unique(target.view(-1,target.size(2)),dim=0).numpy()

      target = target.permute(2, 0, 1).contiguous()

      mapping = {tuple(c): t for c, t in zip(colors.tolist(), range(len(colors)))}

      for k in mapping:

        idx = (target==torch.tensor(k, dtype=torch.uint8).unsqueeze(1).unsqueeze(2))

        validx = (idx.sum(0) == 3) 

        masks[validx] = torch.tensor(mapping[k], dtype=torch.long)

      

    masks = mask_to_class(mask)
   return masks

Since you don’t know the number of classes, I think you would have to estimate it.
Do 2992 unique colors look right for the processed dataset?

What shape does the model output have?
If you are working on a multi-class segmentation use case, the output should have the shape [batch_size, nb_classes, height, width], while the target should have the shape [batch_size, height, width] and contain class indices in the range [0, nb_classes-1].

My input image size is 480x480x3, if i assume it has 256 classes, then my last layer of my model will have a channel size 256.