Running in memory error when using torch.multiprocessing together DataLoader and numpy views

Hi,

I try to get DistributedDataParallel for inference running. However, on my way doing that I stumbled across a problem which is largely independent from it. I run into a MemoryError as soon as I want to use numpy views:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/mnt/data/twagner/Projects/TomoTwin/results/202208_YenT_step3/mem_simpel.py", line 54, in run
    for batch in volume_loader:
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 441, in __iter__
    return self._get_iterator()
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1042, in __init__
    w.start()
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/user_software/miniconda3_envs/tomotwin_pt2/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
MemoryError

I used the views before when using DataParallel without torch multiprocessing and it was working. Obviously, torch multiprocessing tries to serialize the data and crashes with memory problems.

Here is a code snippet to reproduce the problem:

As anyone aware of an elegant solution?

Best,
Thorsten

This is a dataloader related question, you might have more luck asking in under the data category.

Thanks for the response! I changed the category to data. Lets see if someone there can help me :slight_smile:

Well, if I simply use the positions of the sliding window instead of the views itself it works. However, would be nice to know if one could get it working with the views.

Could you check if your issue might be related to this one?

1 Like