VOC 2012 dataset consists of images and their corresponding segmentation maps. I want to apply similar transforms to both the image and its segmentation map while loading. Any suggestions about how to proceed for this task?

We don’t have a ready solution implemented, but there has been some discussion in a torchvision issue.

@Gaurav_Pandey you can easily adapt a dataset to handle co_transforms in the __call__ function (e.g. see this gist which has general structure that handles co_transforms):

and here are some relevant affine transforms to actually use – you’ll see the transforms must take in two arguments for the input and target images:

Thanks guys. That was very helpful.

I also implemented a dataset for VOC2012.