Using PyTorch Transformers

Hi,

I’m using a set of transformers defined like this for the train_dataset:

def train_transformer():
    """
    Train transformer.
    :return: a transformer
    """
    transformer = transforms.Compose([
        transforms.RandomCrop(size=(256, 256)),  # randomly crop am image
        transforms.RandomRotation(degrees=5),    # randomly rotate image
        transforms.RandomHorizontalFlip(),  # randomly flip image vertically
        transforms.RandomVerticalFlip(),    # randomly flip image horizontally
        transforms.ToTensor()])  # transform it into a torch tensor

    return transformer

When I try to use it just before returning the sample (dict containing ‘image’ and ‘mask’),


        sample = {'image': image,
                  'mask': mask}

        if self.transform:
            sample = self.transform(sample)

        return sample

I get the following error:

AttributeError: Traceback (most recent call last):
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/project/geospatial/application/cs230-sifd/source/step/loader/sifd/dataset.py", line 375, in __getitem__
    sample = self.transform(sample)
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py", line 49, in __call__
    img = t(img)
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py", line 421, in __call__
    i, j, h, w = self.get_params(img, self.size)
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py", line 394, in get_params
    w, h = img.size
AttributeError: 'dict' object has no attribute 'size'

I assume self.transform is the transformer.

You cannot apply the transformation on a dict. You should apply it on PIL.Images.
So probably self.transform(sample['image']) will work.

If you need the exact same transformation for your sample and mask, which seems to be the case, have a look at this post.

So, I need to define a transform method internal to my dataset class and use the transforms.compose set of transforms to process the image and mask separately.

What is the recommended/torch framework compatible way of returning images and labels from a dataset? Is it better to return them separately or use a sample dict to return it?

The PyTorch tutorials use the sample dict approach: Writing Custom Datasets, DataLoaders and Transforms — PyTorch Tutorials 2.1.1+cu121 documentation

Sample of our dataset will be a dict {‘image’: image, ‘landmarks’: landmarks}. Our dataset will take an optional argument transform so that any required processing can be applied on the sample.

Its easy to miss the fact that torchvision transforms cannot handle both the image and mask in one go, without going through the sources.

I think it depends on your coding style. Dicts could make it easier to pass data around.
On the other side, as far as I know, no method handles dicts directly, so you would have to unpack them.

I think the safest way is to include your transformations into the __getitem__ and make sure to apply the same transformation on the sample and mask.

I added this method to my dataset class, which take the user-supplied transforms as an additional parameter:

    def _transform(self, image, mask, transform):
        """
        Apply transforms to both the image and the mask.

        :param transform: A transform object.
        :return: transformed image and mask
        """

        image = transform(image)
        mask  = transform(mask)

        return image, mask

I call it in my getitem()

    def __getitem__(self, index):

        <snip>

        """
        Sample of our dataset will be dict {'image': image, 'mask': mask}.
        This dataset will take an optional argument transform so that any
        required processing can be applied on the sample.

        We will convert the scale and convert the sample to uint8 so that
        it can be viewed as a normal file. This will be useful for offline
        batch processing to generate a cached dataset.
        """

        image = (image * 255).astype(np.uint8)
        mask  = (mask  * 255).astype(np.uint8)

        """
        Apply user-specified transforms to image and mask.
        """
        if self.transform:
            image, mask = self._transform(image, mask, self.transform)

        """
        Sample of our dataset will be dict {'image': image, 'mask': mask}.
        This dataset will take an optional argument transform so that any
        required processing can be applied on the sample.
        """
        sample = {'image': image,
                  'mask' : mask}

        return sample

and when I try to use it, I get the following error:

TypeError: Traceback (most recent call last):
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/project/geospatial/application/cs230-sifd/source/step/loader/sifd/dataset.py", line 388, in __getitem__
    image, mask = self._transform(image, mask, self.transform)
  File "/project/geospatial/application/cs230-sifd/source/step/loader/sifd/dataset.py", line 189, in _transform
    image = transform(image)
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py", line 49, in __call__
    img = t(img)
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py", line 421, in __call__
    i, j, h, w = self.get_params(img, self.size)
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/transforms/transforms.py", line 394, in get_params
    w, h = img.size
TypeError: 'int' object is not iterable

Try to convert your numpy image to a PIL.Image:

import torchvision.transforms.functional as TF

transform = transforms.RandomCrop(24)

x = np.ones((3, 50, 50), dtype=np.uint8)
x = x * 255

x = torch.from_numpy(x)
x = TF.to_pil_image(x)
transform(x)

As a side note, I think your transformation won’t work correctly, since now transform is called sequentially on the image and mask, which will sample different random values for your random transformations.
Your image might therefore be cropped using another position than the mask.

Oh, what should I do? I thought I’d resort to using this to save some time.

I ran into some issues with PIL not being able to read a 10-ch binary mask from disk. I was planning on using the Augmenter library, which is really good at generating cached datasets. The image cropping works fine, but if the ground truths have more than I’m guessing 3 channels, PIL which is used internally by the Augmenter library fails.

So, I thought I’d use PyTorch’s transforms routines to consistently crop, rotate and flip images from the dataset to feed the NN.

Well, torchvision’s transformation also build on top of PIL.
I’m not sure if there is a workaround to use 10-channel images in PIL.
Maybe you could load the 10-channel mask and slice it into 10 binary images?

I’ve posted a link to an example using the functional API to transform the image and mask.

This is my generate mask function. I was concatenating it and returning it to the dataset.

    def _generate_mask(self, sample_shapes, height, width, sample_grid):
        """
        Generate ground truth.
        :param sample_shapes:
        :param height:
        :param width:
        :param sample_grid:
        :return:
        """
        layers = []
        # generate individual mask layers
        for class_id in range(1, self.classes, 1):
            layer = self._generate_class_truth_layer(shapes=sample_shapes[class_id],
                                                     height=height,
                                                     width=width,
                                                     sample_grid=sample_grid)
            layers.append(np.expand_dims(layer, 2))

        # concatenate individual mask layers
        mask = np.concatenate(layers, axis=2)

        # generate background mask
        if self.background_mask:
            background = np.ones((height, width, 1)) - np.expand_dims(np.logical_or.reduce((mask[:, :, :10] == 1), axis=2), axis=2)
            # first channel is the background mask
            mask = np.concatenate((background, mask), axis=2).astype(np.uint8)

        return mask

This is the portion of my dataset that returns the image and mask towards the end.

        """
        Mask generation
        """
        mask = self.mask_generator.mask(id=id_, height=h, width=w)
        if mask is None:
            raise ValueError('Could not generate concatenated mask!')

        """
        Sample of our dataset will be dict {'image': image, 'mask': mask}.
        This dataset will take an optional argument transform so that any
        required processing can be applied on the sample.

        We will convert the scale and convert the sample to uint8 so that
        it can be viewed as a normal file. This will be useful for offline
        batch processing to generate a cached dataset.
        """

        image = (image * 255).astype(np.uint8)
        mask  = (mask  * 255).astype(np.uint8)

        image = torch.from_numpy(image)
        image = tvf.to_pil_image(image)

        mask = torch.from_numpy(mask)
        mask = tvf.to_pil_image(mask)

        """
        Apply user-specified transforms to image and mask.
        """
        if self.transform:
            image, mask = self._transform(image, mask, self.transform)

        """
        Sample of our dataset will be dict {'image': image, 'mask': mask}.
        This dataset will take an optional argument transform so that any
        required processing can be applied on the sample.
        """
        sample = {'image': image,
                  'mask' : mask}

        return sample

So. we’re doing all this just because PIL can’t handle transformation of a 10-ch mask. If I do not use Torch transformers and write my own Numpy based transformers, it should work?

Yes it should work and also should be quite fast, since numpy conversions are nearly free, because they share the same data with the tensor.
Also, have a look at @ncullen93’s gist. There might to be some useful transformations for you.

I tried this one, taken from the gist,

    def _transform(self, image, mask):
        """
        Apply transforms to both the image and the mask.

        :param transform: A transform object.
        :return: transformed image and mask
        """

        # image ordering
        img_row_axis = 0
        img_col_axis = 1
        channel_axis = 2

        # random crop
        c_h = self.preprocessing_params['image']['crop']['height']
        c_w = self.preprocessing_params['image']['crop']['width']
        image, mask = random_crop(image, mask, size=(c_h, c_w))

        if self.mode == 'train' or self.mode == 'dev':
            # random horizontal flip
            if np.random.random() > 0.5:
                image = np.asarray(image).swapaxes(img_col_axis, 0)
                image = image[::-1, ...]
                image = image.swapaxes(0, img_col_axis)

                mask = np.asarray(mask).swapaxes(img_col_axis, 0)
                mask = mask[::-1, ...]
                mask = mask.swapaxes(0, img_col_axis)

        # transform to tensor
        image = image_to_tensor(image)
        mask = binary_mask_to_tensor(mask, threshold=0.5)

        return image, mask

but it throws an error:

ValueError: Traceback (most recent call last):
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/tool/python/conda/env/gis36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/project/geospatial/application/cs230-sifd/source/step/loader/sifd/dataset.py", line 441, in __getitem__
    image, mask = self._transform(image, mask)
  File "/project/geospatial/application/cs230-sifd/source/step/loader/sifd/dataset.py", line 236, in _transform
    image = image_to_tensor(image)
  File "/project/geospatial/application/cs230-sifd/source/step/preprocessing/image/tensor.py", line 48, in image_to_tensor
    tensor = torch.from_numpy(image).float()
ValueError: some of the strides of a given numpy array are negative. This is currently not supported, but will be added in future releases.

The random crop operation works fine. The random horizontal flip operation fails just before the tensor conversion operation.

The tensor conversion functions are as follows:


def image_to_tensor(image):
    """
    Transform an numpy image to a torch tensor.

    We will have to swap the channel axis because numpy uses channel last ordering and
    torch uses channel first ordering.
    - numpy image: H x W x C
    - torch image: C X H X W

    :param image (np.ndarray): Input image.
    :return: tensor: A PyTorch tensor.
    """

    image = image.transpose(2, 0, 1)
    tensor = torch.from_numpy(image).float()
    return tensor


def binary_mask_to_tensor(mask, threshold):
    """
    Transform a binary mask to a tensor.

    We will have to swap the channel axis because numpy uses channel last ordering and
    torch uses channel first ordering.
    - numpy image: H x W x C
    - torch image: C X H X W

    :param mask (np.ndarray): A binary mask array, usually of type uint8.
    :param threshold: The threshold used to consider if the mask is present.
    :return: tensor: A PyTorch tensor.
    """

    mask = mask.transpose(2, 0, 1)
    mask = binarize(mask, threshold).astype(np.float32)
    tensor = torch.from_numpy(mask).float()
    return tensor

Try to add .copy() to the numpy arrays with negative strides, e.g. here:

image = image[::-1, ...]copy()

This will copy the data and make the data contiguous again.