Try to understand the VOC2012 Dataset for semantic segmentation using Pytorch

Hi all,

I want to use PASCAL VOC 2012 dataset for semantic segmentation using Pytorch.

First, I’d like to understand this dataset. After that I want to build from scratch a model in pytorch for sementic segmentation to learn more.

I have just downloaded the VOC2012 Dataset, and in the folder JPEGImages in VOCdevkit/VOC2012/JPEGImages I have 17125 images.

But when I apply the code :

import os

def make_datapath_list(voc2012_path):
    img_dir = os.path.join(voc2012_path, "JPEGImages")
    anno_dir = os.path.join(voc2012_path, "Annotations")

    subsets = ["train", "val"]
    train_img_list, train_anno_list, val_img_list, val_anno_list = [], [], [], []

    for subset in subsets:
        subset_samples_path = os.path.join(voc2012_path, f"ImageSets/Main/{subset}.txt")
        with open(subset_samples_path, "r") as f:
            subset_samples =

        for sample_name in subset_samples:
            img_path = os.path.join(img_dir, f"{sample_name}.jpg")
            anno_path = os.path.join(anno_dir, f"{sample_name}.xml")

            if subset == "train":
            elif subset == "val":

    return train_img_list, train_anno_list, val_img_list, val_anno_list

voc2012_path = "./VOCdevkit/VOC2012"
train_img_list, train_anno_list, val_img_list, val_anno_list = make_datapath_list(voc2012_path)
print("trainlist: ", len(train_img_list))
print("vallist: ", len(val_img_list))
print("train_anno_list: ", len(train_anno_list))
print("val_anno_list: ", len(val_anno_list))

I get as result

trainlist: 5717
vallist: 5823
train_anno_list: 5717
val_anno_list: 5823

Why I get the sum 5717 + 5823 = 11540 instead of 17125

Why I didn’t get 17125?

Did I make a mistake or did I misunderstand the dataset ?

Thanks for your help.

Hi All,

For those who are interested, I found the solution to my question on page 6 of this website.