Hello everybody!
I’m using Chars74K dataset to train a model for recognize text in the wild. I would like to make some test with different model, transformations, hyperparameters and compare the results.
I would like to:
Binarize the image (black or white pixels) with a threshold on each image on train loader
Apply a mask (Chars74K provide a mask for each image to cut only the number/letter in the image)
The easiest way to apply both transformation would be in my opinion to create a new Dataset.
Since you are lazily loading your images from a folder, we should keep it that way.
Just get all image paths (e.g. with glob) and pass it to the Dataset.
I created a small example, which should give you a good starting point:
class MyDataset(Dataset):
def __init__(self, data_paths, target, threshold, masks, transform=None):
self.data_paths = data_paths
self.target = target
self.transform = transform
self.threshold = threshold
self.masks = masks
def __getitem__(self, index):
x = Image.open(self.data_paths[index])
y = self.target[index]
if self.transform:
x = self.transform(x)
Apply threshold here
x = x > self.threshold
x = x.float() # Cast back to float, since x is a ByteTensor now
# Apply mask here
x = x * self.masks[index]
return x, y
def __len__(self):
return len(self.data_paths)
You can try the binarization and masking with these sample lines:
# Create fake images
data = torch.Tensor(100, 1, 24, 24).random_(0, 255)
print(data)
# Apply threshold
data = data > 128
data = data.float()
print(data)
# Create fake masks for every image
mask = torch.Tensor(100, 1, 24, 24).random_(0, 2)
data = data * mask
Thank you for your help, the sample code that you gave me do exactly what I want anyway I don’t know how implement the first piece of code.
I load my imagewith datasets.Imagefolder function, this one allow me to divide under the root path every image by class in different folder (and automatically understand that each folder is a different target) so I don’t know how can I use glob in this case.
Ok, I see. Since you have sorted your images in a nice way, let’s just use some internal functions from torchvision. Here you can see some methods which are used to collect all images and create the targets for them.
Let’s just use these instead of rewriting the code:
imgs is a list of tuples storing an image path with its target.
You can add these lines to your __init__ method of the Dataset and store imgs as a member.
In the __getitem__ function you could call path, target = self.imgs[index] to get the current sample.