How to use custom image transformations with torchvision

My problem is fairly simple but I’m not sure if I’m doing it correctly. I will state what I’m doing so far and wish that someone will tell me if I’m mistaken or if I’m doing it correctly as I have not found a solution online.

I have coded an algorithm to make the “Shades of Gray” normalization of an image. I want this algorithm to be run on every image of my dataset. In order to do this I create a transforms.Compose. A snippet of the code would look like this:

import torchvision import transforms
from color_constancy import shades_of_gray

transform = transforms.Compose([
                               shades_of_gray(),
                               transforms.RandomVerticalFlip(),
                               transforms.RandomHorizontalFlip(),
                               transforms.ToTensor()
                              ])

dataset = torchvision.dataset.ImageFolder(train_path, transform=transform)

The module “color_constancy” is a self made script. The shades_of_grays method only accepts one image at a time but I suppose that the transform is done image-wise (again, correct me if I’m wrong) After this I just take the dataset into a dataloader and continue with the standard procedures.

Thanks for any advice,

Best regards.

Am

Your approach looks alright.
Have a look at these transform implementation, which you could use as a template for your custom transform. :wink:

3 Likes

Thank you for the confirmation. While trying to implement it I run to a problem. This shades_of_gray method has an image as an argument:

def shades_of_gray(img, power=6, extra=None):
    
    # Parameters
    # ----------
    # img: 2D numpy array
    #   The original image with format of (h, w, c)
    # power: int
    #   The degree of norm, 6 is used in reference paper
    
    img_dtype = img.dtype

    img = img.astype('float32')
    img_power = numpy.power(img, power)
    rgb_vec = numpy.power(numpy.mean(img_power, (0,1)), 1/power)
    rgb_norm = numpy.power(numpy.sum(numpy.power(rgb_vec, extra)),1/extra)
    rgb_vec = rgb_vec/rgb_norm
    rgb_vec = 1/(rgb_vec*numpy.sqrt(3))
    img = numpy.multiply(img, rgb_vec)

    return img.astype(img_dtype)

Thus when make the transform.Compose I get: TypeError shades_of_gray() missing 1 required positional argument: img.

I suppose the error is not tricky to solve but I can’t figure out what it is. From your link I suspect that I should make a class with call and repr methods but I don’t fully understand how I should do that.

Thank you for your time.

Thanks to @ptrblck link I could figure out how to implement my transform. I had to make a few tweaks to transform from PIL image to numpy back and forth but so far it isn’t throwing errors anymore. In case this proves useful for anyone on the future (I know I’d have been that way for me) I will leave my final code below.

class shades_of_gray(object):
   
   #     Parameters
   #    ----------
   #   img: 2D numpy array
   #         The original image with format of (h, w, c)
   #     power: int
   #         The degree of norm, 6 is used in reference paper
   # 
     
    
    def __call__(self, img):
        """
        :param img: PIL): Image 

        :return: Normalized image
        """
        img = numpy.asarray(img)
        img_dtype = img.dtype

        power = 6
        extra = 6

        img = img.astype('float32')
        img_power = numpy.power(img, power)
        rgb_vec = numpy.power(numpy.mean(img_power, (0, 1)), 1 / power)
        rgb_norm = numpy.power(numpy.sum(numpy.power(rgb_vec, extra)), 1 / extra)
        rgb_vec = rgb_vec / rgb_norm
        rgb_vec = 1 / (rgb_vec * numpy.sqrt(3))
        img = numpy.multiply(img, rgb_vec)
        img = img.astype(img_dtype)

        return Image.fromarray(img)

    def __repr__(self):
        return self.__class__.__name__+'()'

If you see anything wrong or have any tips that I could follow, feel free to share them. Also, I don’t fully understand what does the “def repr(self)” line actually do.

Thank you for your time.

3 Likes

Your code looks good. :slight_smile:

The __repr__ method is used to print some information of the class, if you use print(my_transform).
You could also remove it and just use the default Python implementation.
Other transform classes use it to print additional information about the passed arguments etc.

I am trying to add gaussian noise as part of the image transforms. I was able to add noise through a tensor. But I want to add noise through PIL Image data. How can I modify the below code block for the same?

class gaussianNoise():
    def __init__(self, mean, stddev):
        self.mean = mean
        self.stddev = stddev

    def __call__(self, tensor):
        noise = torch.zeros_like(tensor).normal_(self.mean, self.stddev)
        return tensor.add_(noise)

    def __repr__(self):
        repr = f"{self.__class__.__name__  }(mean={self.mean},
               stddev={self.stddev})"
        return repr

PIL.Images are using numpy arrays under the hood, so you could create the array via:

arr = np.array(img)

and use numpy methods to add the noise.