VOCDetection year='2007' Problem

I am facing a problem when I use:
torchvision.datasets.VOCDetection(’/content’, year=‘2007’, image_set=‘train’, download=True)
The specific problem is that download and extraction is happening for VOCtest_06-Nov-2007.tar instead of VOCtrainval_06-Nov-2007.tar. My guess is that this is due to recent changes made to voc.py. Not an expert but I think problem might be here: vision/voc.py at c808d163f6c65ca851db59e9966807a9220fc493 · pytorch/vision · GitHub [lines 88-94]
I am using the following hack for now and it seems to work:

torchvision.datasets.voc.DATASET_YEAR_DICT[‘2007-trainval’] = torchvision.datasets.voc.DATASET_YEAR_DICT[‘2007’]

dummy = torchvision.datasets.VOCDetection(’/content’, year=‘2007-trainval’, image_set=‘train’, download=True)

This basically downloads and extracts the right tar file and subsequent commands work.

Yep good call. Seems that whenever the year is “2007” they’ve created a bug where they change the key to “2007-test” so you only ever load the test data. Might be a good idea to create an Issue for this on the PyTorch Github. If you don’t want to I would be happy to.

self.year = year

        valid_image_sets = ["train", "trainval", "val"]
        if year == "2007":
            valid_image_sets.append("test")
            key = "2007-test"
        else:
            key = year
        self.image_set = verify_str_arg(image_set, "image_set", valid_image_sets)
        dataset_year_dict = DATASET_YEAR_DICT[key]

And in the DATASET_YEAR_DICT we have

    '2007': {
        'url': 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar',
        'filename': 'VOCtrainval_06-Nov-2007.tar',
        'md5': 'c52e279531787c972589f7e41ab4ae64',
        'base_dir': os.path.join('VOCdevkit', 'VOC2007')
    },
    '2007-test': {
        'url': 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar',
        'filename': 'VOCtest_06-Nov-2007.tar',
        'md5': 'b6e924de25625d8de591ea690078ad9f',
        'base_dir': os.path.join('VOCdevkit', 'VOC2007')
    }
1 Like

Thanks a lot Patrick for the affirmation. As per your feedback, I took a stab at creating an Issue here: VOCDetection year=‘2007’ Problem · Issue #53971 · pytorch/pytorch · GitHub . Please let me know if I haven’t accurately represented the problem.