Dataloader index keeps changing

I have dataset class as:

class SomeDataset(Dataset):
    def __init__(self, usage='val', dataset_pickle_file='./some.pkl', skip_every_n_image=1):
        super(SomeDataset, self).__init__()

        self.to_tensor = transforms.ToTensor()

        with open(dataset_pickle_file, 'rb') as file:
            self.data_info = pickle.load(file)[usage]
        self.idx = [i for i in range(0, len(self.data_info[0]), skip_every_n_image)]
        self.data_len = len(self.idx)

    def __len__(self):
        return self.data_len

    def __getitem__(self, index):  #1
        instanceNum = self.idx[index]
        color_img = self.data_info[0][instanceNum]
        color_img = Image.open(color_img)
        color_tensor = self.to_tensor(color_img)
        d_img = self.data_info[1][instanceNum] #2
        d_img = Image.open(d_img)
        d_tensor = self.to_tensor(d_img)
        output = {'image': color_tensor, 'dt': d_tensor}
        return output

I am running the code in debug mode on PyCharm. The initial value of index seems to be 11936 on entering getitem(at 1).
But on reaching d_img the value of index and instanceNum seems to be 1517(at 2).
Isn’t the value of index supposed to be constant during a particular call to getitem? The pickle file stores urls to consecutive items of color_img and d_img, ie a particular index of d_img represents the depth of color_img with the same index. Thus when the value of index/instanceNum changes within the same call to the function I guess the result from output is combining an image and a depth image of two different instances.

Can you all please suggest what I might be doing wrong? Thanks!

I doubt the index is changing within the same function. Are you actually printing it?
I suspect it may be due to the multiprocessing.
Try to set the num_workers to 0 in the Data Loader and check if the index is changing now.

1 Like