I wrote the follow code,
from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
def __init__(self):
print('created!')
self.a = None
def __getitem__(self, item):
if self.a is None:
print('set a to 1')
self.a = 1
return item
def __len__(self):
return 10
def __del__(self):
print("deleted")
if __name__ == '__main__':
my_dataloader = DataLoader(MyDataset(), batch_size=2, num_workers=2)
for data in my_dataloader:
pass
And the output of the script is
created!
set a to 1
set a to 1
deleted
There are two set a to 1
in the output, and the num_workers
is set to 2
, so I hypothesis that DataLoader copied one dataset instance for each process. But only one created!
and one deleted
are printed. This indicate that there is only one instance of MyDataset
. So what is exactly behavior of DataLoader
when set num_workers
> 0? How does it deal with MyDataset
instance?