I’m struggling with a bug where the tensors returned by my DataLoader are garbage if num_workers > 1, and I assume this is caused by the underlying dataset using a file pointer (the whole dataset is one file) which is forked to the workers.
What’s the best way to distribute such a dataset to multiple workers? Is there some easy way of instantiating the dataset separately/cloning the file pointers for each worker? Or are there other better approaches?
Are you currently passing in a worker_init_fn to DataLoader? It is a function that gets passed to each worker to initialize.
def worker_init_fn(worker_id):
worker_info = torch.utils.data.get_worker_info()
dataset = worker_info.dataset # the copy of the dataset object in this process. Note that this will be a different object in a different process than the one in the main process.
# Either instantiate the dataset or create a new file pointer
These sections in the documentation may be helpful: