Good afternoon!
I have questions about the following tutorial:
https://pytorch.org/tutorials/beginner/data_loading_tutorial.html
I have a similar dataset (images + landmarks) and the dataset is labeled.
- I didn’t quite understand from the tutorial when the labels are incorporated into the dataloader.
- When the tutorial does image rescaling, do the landmark coordinates change accordingly?
- In terms of image augmentation options, the tutorial only demonstrates Rescale, RandomCrop and ToTensor. In case more augmentations are needed, should I write them as a separate class (like Rescale in the tutorial) or maybe it’s possible to use the built-in transforms options? (for example, transforms.RandomHorizontalFlip() or transforms.Normalize())
Hi Anna,
- The
Dataset
(FaceLandmarksDataset
) is the one that returns both the image and the coordinates in its __getitem__
method. This is typical, the dataloaders handle things like in what order to go through the dataset, using what minibatch size, and so on, but the core data is returned by the dataset rather than the dataloader.
- In the tutorial, the transforms they use take in a dictionary that contains both the image and the landmarks, and indeed make the appropriate transformation to both of those. You can see that in the definition of the
Rescale
and RandomCrop
transforms:
img = transform.resize(image, (new_h, new_w))
# h and w are swapped for landmarks because for images,
# x and y axes are axis 1 and 0 respectively
landmarks = landmarks * [new_w / w, new_h / h] # <-- here
So the landmark changes are not something that PyTorch does for you automatically, you have to take care of that yourself explicitly by writing your own custom logic.
- Since you need to also transform the landmarks, unfortunately you will not be able to just use most built-in functions out of the box, and you’ll have to write your own, like they do in the tutorial. If you had been working on a different project that didn’t have localized data such as bounding boxes or landmarks, such as image classification, then you absolutely could use the builtins.
To be clear, you can use builtins that don’t affect the geometry of the image, such as ColorJitter.
Hope this helps!
Andrei, thank you so much for the detailed answer.
This is my first experience with PyTorch, hence tons of questions!
Just to clarify:
- If I understood your answer correctly, I need to incorporate labels in the dataset itself, not in the dataloader, right?
- Got it
- Could you, please, give some examples of possible augmentations, in addition to ColorJitter, for which built-ins can be used?
Thanks for your help!
Thank you so much, you’ve helped me a lot!