Binarize image on training

Hello everybody!
I’m using Chars74K dataset to train a model for recognize text in the wild. I would like to make some test with different model, transformations, hyperparameters and compare the results.

I would like to:

  • Binarize the image (black or white pixels) with a threshold on each image on train loader
  • Apply a mask (Chars74K provide a mask for each image to cut only the number/letter in the image)

This is how I load image form folder

import torch
import torchvision
import numpy as np
import matplotlib.pyplot as plt

from torchvision import transforms, datasets

data_transform = transforms.Compose([
        transforms.Resize((64,64)),
        transforms.Grayscale(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5],
                             std=[0.225, 0.225, 0.225])
    ])
chars74k_dataset = datasets.ImageFolder(root='English/Img/GoodImg/Bmp',
                                           transform=data_transform)
dataset_loader = torch.utils.data.DataLoader(chars74k_dataset,
                                             batch_size=1, shuffle=True,
                                             num_workers=4)

Thank you for your help
Matteo

The easiest way to apply both transformation would be in my opinion to create a new Dataset.
Since you are lazily loading your images from a folder, we should keep it that way.
Just get all image paths (e.g. with glob) and pass it to the Dataset.

I created a small example, which should give you a good starting point:

class MyDataset(Dataset):
    def __init__(self, data_paths, target, threshold, masks, transform=None):
        self.data_paths = data_paths
        self.target = target
        self.transform = transform
        self.threshold = threshold
        self.masks = masks
    
    def __getitem__(self, index):
        x = Image.open(self.data_paths[index])
        y = self.target[index]
    
        if self.transform:
            x = self.transform(x)
    
         Apply threshold here
        x = x > self.threshold
        x = x.float() # Cast back to float, since x is a ByteTensor now
    
        # Apply mask here
        x = x * self.masks[index] 
    
        return x, y

    def __len__(self):
        return len(self.data_paths)

You can try the binarization and masking with these sample lines:

# Create fake images
data = torch.Tensor(100, 1, 24, 24).random_(0, 255)
print(data)

# Apply threshold
data = data > 128
data = data.float()
print(data)

# Create fake masks for every image
mask = torch.Tensor(100, 1, 24, 24).random_(0, 2)
data = data * mask

Let me know, if this is working for you.

Thank you for your help, the sample code that you gave me do exactly what I want anyway I don’t know how implement the first piece of code.

I load my imagewith datasets.Imagefolder function, this one allow me to divide under the root path every image by class in different folder (and automatically understand that each folder is a different target) so I don’t know how can I use glob in this case.

Anyway I have implemented a lambda function in the transform function of my dataset and seems to work for binarizing.

data_transform = transforms.Compose([
        transforms.Resize((64,64)),
        transforms.Grayscale(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5],
                            std=[0.225, 0.225, 0.225]),
        lambda x: x>0,
        lambda x: x.float(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5],
                            std=[0.225, 0.225, 0.225])
    ])

I still have to implement masking on input
Thank you for your help

Matteo

Ok, I see. Since you have sorted your images in a nice way, let’s just use some internal functions from torchvision.
Here you can see some methods which are used to collect all images and create the targets for them.
Let’s just use these instead of rewriting the code:

root = '[YOURPATH]/Bmp'
classes, class_to_idx = torchvision.datasets.folder.find_classes(dir=root)
imgs = torchvision.datasets.folder.make_dataset(root, class_to_idx)
path, target = imgs[0]

imgs is a list of tuples storing an image path with its target.
You can add these lines to your __init__ method of the Dataset and store imgs as a member.

In the __getitem__ function you could call path, target = self.imgs[index] to get the current sample.

Hope that helps!

1 Like