Using DataLoader with num_workers>0 causes re-run of script

I tried to make a minimal representation of the code:

from torch.utils.data import DataLoader
from torch.utils.data.sampler import Sampler
from torch.utils.data import Dataset
import numpy as np

print('running this script')

class Basic_Sampler(Sampler):
    def __init__(self, samples_list):
        self.sample_list = samples_list
        self.epoch_size = len(samples_list)
    def __iter__(self):
        return iter(self.sample_list)
    def __len__(self):
        return self.epoch_size

class fake_dataset(Dataset):
    def __init__(self):
            pass
    def __getitem__(self, item):
        return np.zeros(5)

sampler = Basic_Sampler([1,2,3])
dl = DataLoader(fake_dataset(), batch_size=2, shuffle=False, sampler=sampler, num_workers=0)

iter(dl).next()

If you run this with num_workers=0 it will print running this script and return as expected.
If you run this with num_workers=1 it will run twice for some reason, it will print:

running this script
running this script

and then it will crash with RuntimeError: DataLoader worker (pid(s) 18984) exited unexpectedly

Iā€™m running:

Am I missing something here?

Edit:
on Colab it works without issue: https://colab.research.google.com Does that mean it is a Windows problem?

In Windows you would need to use the if-clause protection as explained in the Windows FAQ to avoid this behavior.

1 Like