__getitem__ is called multiple times

I’m using a custom Dataset where the path to the images is created from a Pandas DataFrame. Out of curiosity I added a print(idx) to the __getitem__ function and I noticed that it’s called twice (it prints two different indices) if I use 1 worker. If I use multiple workers, it’s called even more times. The batch size is 1, though.
Am I missing something? Shouldn’t I get just one image? Moreover, it returns just one image, independently of the number of workers (as it should be).

Could you please share the code?

It’s rather difficult to understand what it does without having the Pandas DataFrame (which I cannot share, I guess). But here’s the class:

class Data(Dataset):

    def __init__(self, mode, df, img_dir, site, transform):
        self.mode = mode
        self.df = df
        self.img_dir = img_dir
        self.site = site
        self.transform = transform
    
    def path_channel(self, channel, idx):
        experiment = self.df.loc[self.df.index[idx], 'experiment']
        plate = self.df.loc[self.df.index[idx], 'plate']
        well = self.df.loc[self.df.index[idx], 'well']

        path = os.path.join(self.img_dir, experiment, f'Plate{plate}', 
                            f'{well}_s{self.site}_w{channel}.png')
        
        return path
    
    def __getitem__(self, idx):
        print(idx)  # With 1 process and batch size 1, printed twice (different items)

        # Iterate over channels of one image (from file)
        all_channels = [np.array(Image.open(self.path_channel(ch, idx)), 
                                dtype=np.float32) for ch in range(1, 7)]
        img = np.stack([ch for ch in all_channels], axis=2)
        
        if self.mode == 'train':
            label = self.df.loc[self.df.index[idx], 'label'].astype('int32')

            return img, label
        elif self.mode == 'test':

            return img
    
    def __len__(self):
        return self.df.shape[0]

Each worker will create a batch and call into your Dataset's __getitem__.
For num_workers=0, the main thread will be used to create the batch. For num_workers=1 you will use another additional process to fetch the next batch.