Dataset interaction with a global array

Can the getitem function of a dataset class write to a global array while iterated by a dataloader?

from torch.utils.data import Dataset, DataLoader

indexes_seen = []


class MyDataset(Dataset):
    def __init__(self,):
        
        print("constructor")
        self.dummy = list(range(100))

    def __len__(self):
        return len(self.dummy)

    def __getitem__(self,idx):
        global indexes_seen
        indexes_seen.append(idx)     
        
        return self.dummy[idx]

ds = MyDataset()
dl = DataLoader(ds, batch_size=4, num_workers=4, shuffle=False)

for item in dl:
    break

print(indexes_seen)

>>> [ ]

indexes_seen array is printed as empty, but if printed in the getitem function, I can see elements being added to it.

You might need to share the actual memory as seen e.g. in this example.

1 Like