`persistent_workers=True` for the DataLoader causes errors

Hello all,

I upgraded from PyTorch 1.6 to 1.7 today because I wanted to run some experiments with custom datasets and the new persistent_workers argument from the DataLoader class.

When I’m running my experiments, I keep getting errors from some internal files related to pinned memory. To make the issue reproducible, I also ran an experiment with the ResNet on ImageNet example from the PyTorch repo [1]. The only thing that I changed is that I fixed the number of classes to 200 because I am using a small subset of ImageNet for faster experimentation.

This is the error that I’m receiving:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
    idx, data = r
ValueError: not enough values to unpack (expected 2, got 0)

Traceback (most recent call last):
  File "/home/lbhm/Projects/dl2-benchmark/dl2/pytorch/experiments/pytorch_test_raw.py", line 429, in <module>
    main()
  File "/home/lbhm/Projects/dl2-benchmark/dl2/pytorch/experiments/pytorch_test_raw.py", line 112, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "/home/lbhm/Projects/dl2-benchmark/dl2/pytorch/experiments/pytorch_test_raw.py", line 245, in main_worker
    train(train_loader, model, criterion, optimizer, epoch, args)
  File "/home/lbhm/Projects/dl2-benchmark/dl2/pytorch/experiments/pytorch_test_raw.py", line 280, in train
    for i, (images, target) in enumerate(train_loader):
  File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 349, in __iter__
    self._iterator._reset(self)
  File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 852, in _reset
    data = self._get_data()
  File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1029, in _get_data
    raise RuntimeError('Pin memory thread exited unexpectedly')
RuntimeError: Pin memory thread exited unexpectedly

Process finished with exit code 1

I just stumbled upon the persistent_worker feature today when I thought about some ideas related to caching information in a custom dataset and couldn’t find any helpful leads so far. Did anyone experience this error before or has an idea where this could come from?

[1] examples/main.py at master · pytorch/examples · GitHub

Update: To verify that this error is specifically related to pinned memory, I also ran the script above with pin_mememory=False when creating the data loader and the error did not occur. So there seems to be an erroneous interaction between pin_memory=True and persistent_workers=True.

2 Likes

Has this been fixed? I am running into the same problem.

running into a similar problem!

Apparently it has been fixed in the nightly build a few days ago:

still running into a similar problem!

Which nightly version are you using and could you post the complete error message?