Multi workers, DDP & pickle files

MohamedA95 · February 17, 2022, 1:50pm

Hello everyone,
I am trying to train an Unet model using DDP, the dataset is a map-style dataset & is stored in pickle files. If I set the number of data loader workers to more than 0 it fails with the below error, even if I use a single GPU. Is there is anything that I should take into account while working with pickle files under DDP?

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/user1/.local/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/user1/training_functions.py", line 469, in train_net_ddp
    eval_net(model=model, classes=classes, criterion=focal_criterion, val_loader=val_dataloader, cuda=cuda, device=index, writer=None,
  File "/home/user1/.local/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/user1/training_functions.py", line 146, in eval_net
    for idx, data in enumerate(val_loader):
  File "/home/user1/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "/home/user1/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/user1/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 918, in __init__
    w.start()
  File "/home/user1/anaconda3/envs/bttenv/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/home/user1/anaconda3/envs/bttenv/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/user1/anaconda3/envs/bttenv/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/user1/anaconda3/envs/bttenv/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/user1/anaconda3/envs/bttenv/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/home/user1/anaconda3/envs/bttenv/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/user1/anaconda3/envs/bttenv/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_dataloaders_generated_data.<locals>.dataset'

cbalioglu · February 22, 2022, 1:42pm

DDP is agnostic to the underlying dataset’s format. The issue seems to be more related to DataLoader. Have you tried loading your dataset without DDP?

MohamedA95 · February 22, 2022, 2:00pm

Hello @cbalioglu, Thanks for your response. I managed to solve the issue but forgot to close the topic. You are correct the data loader was in fact the issue. The actual custom dataset class was for some reason defined inside a function get_data_loaders this answer helped me reach this conclusion.