Joint Transforms for torchvision.datasets.VOCSegmentation

Hello,

I downloaded the latest torchvision from source and am trying to apply the same data augmentation (e.g. horizontal flip) to both the input and the target. The VOCSegmentation class has a transforms argument, but I’m not sure how to use it. Would I have to write my own transform?

I tried writing my own function

class JointHorFlip(object):
    def __call__(self, img, target):
        if random.random() < .5:
            return (F.hflip(img), F.hflip(target))
        return (img, target)

train_set = torchvision.datasets.VOCSegmentation(
            root = '../data',
            year = '2012',
            image_set = 'train',
            transforms = JointHorFlip(),
    )

but when I check through the images, the inputs and targets are still not consistent?

The code looks fine. I tried to debug it, but apparently the download site is down since yesterday: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/

CC @fmassa: did they move the dataset or is the site just down at the moment?