Good afternoon!
I have questions about the following tutorial:

I have a similar dataset (images + landmarks) and the dataset is labeled.

  1. I didn’t quite understand from the tutorial when the labels are incorporated into the dataloader.
  2. When the tutorial does image rescaling, do the landmark coordinates change accordingly?
  3. In terms of image augmentation options, the tutorial only demonstrates Rescale, RandomCrop and ToTensor. In case more augmentations are needed, should I write them as a separate class (like Rescale in the tutorial) or maybe it’s possible to use the built-in transforms options? (for example, transforms.RandomHorizontalFlip() or transforms.Normalize())

Hi Anna,

  1. The Dataset (FaceLandmarksDataset) is the one that returns both the image and the coordinates in its __getitem__ method. This is typical, the dataloaders handle things like in what order to go through the dataset, using what minibatch size, and so on, but the core data is returned by the dataset rather than the dataloader.
  2. In the tutorial, the transforms they use take in a dictionary that contains both the image and the landmarks, and indeed make the appropriate transformation to both of those. You can see that in the definition of the Rescale and RandomCrop transforms:
        img = transform.resize(image, (new_h, new_w))
        # h and w are swapped for landmarks because for images,
        # x and y axes are axis 1 and 0 respectively
        landmarks = landmarks * [new_w / w, new_h / h]  # <-- here

So the landmark changes are not something that PyTorch does for you automatically, you have to take care of that yourself explicitly by writing your own custom logic.

  1. Since you need to also transform the landmarks, unfortunately you will not be able to just use most built-in functions out of the box, and you’ll have to write your own, like they do in the tutorial. If you had been working on a different project that didn’t have localized data such as bounding boxes or landmarks, such as image classification, then you absolutely could use the builtins.
    To be clear, you can use builtins that don’t affect the geometry of the image, such as ColorJitter.

Hope this helps!

Andrei, thank you so much for the detailed answer.
This is my first experience with PyTorch, hence tons of questions!
Just to clarify:

  1. If I understood your answer correctly, I need to incorporate labels in the dataset itself, not in the dataloader, right?
  2. Got it
  3. Could you, please, give some examples of possible augmentations, in addition to ColorJitter, for which built-ins can be used?

Thanks for your help!

  1. That’s right. Dataloader just iterates over the dataset in various ways that you can specify (whether or not to go in random order, how many payloads at a time to deliver, etc.) but the actual stuff that’s being yielded, in your case the pairs of (image, landmarks), comes from the dataset itself.

  2. Just going down the list here in order: ColorJitter, Grayscale, RandomGrayscale, GaussianBlur, RandomInvert, RandomPosterize, RandomSolarize, RandomAdjustSharpness, RandomAutocontrast, RandomEqualize, Normalize, RandomErasing

    Basically anything that alters the color of pixels but not their position is OK.

    Note that some of those work on both Tensors and Images (two different data types that you can convert back and forth bewteen using the ToTensor() and ToPILImage() transforms) while others work on just Tensor (such as Normalize).

Thank you so much, you’ve helped me a lot!