Hello all,
I upgraded from PyTorch 1.6 to 1.7 today because I wanted to run some experiments with custom datasets and the new persistent_workers
argument from the DataLoader
class.
When I’m running my experiments, I keep getting errors from some internal files related to pinned memory. To make the issue reproducible, I also ran an experiment with the ResNet on ImageNet example from the PyTorch repo [1]. The only thing that I changed is that I fixed the number of classes to 200 because I am using a small subset of ImageNet for faster experimentation.
This is the error that I’m receiving:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop
idx, data = r
ValueError: not enough values to unpack (expected 2, got 0)
Traceback (most recent call last):
File "/home/lbhm/Projects/dl2-benchmark/dl2/pytorch/experiments/pytorch_test_raw.py", line 429, in <module>
main()
File "/home/lbhm/Projects/dl2-benchmark/dl2/pytorch/experiments/pytorch_test_raw.py", line 112, in main
main_worker(args.gpu, ngpus_per_node, args)
File "/home/lbhm/Projects/dl2-benchmark/dl2/pytorch/experiments/pytorch_test_raw.py", line 245, in main_worker
train(train_loader, model, criterion, optimizer, epoch, args)
File "/home/lbhm/Projects/dl2-benchmark/dl2/pytorch/experiments/pytorch_test_raw.py", line 280, in train
for i, (images, target) in enumerate(train_loader):
File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 349, in __iter__
self._iterator._reset(self)
File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 852, in _reset
data = self._get_data()
File "/home/lbhm/miniconda3/envs/pt17/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1029, in _get_data
raise RuntimeError('Pin memory thread exited unexpectedly')
RuntimeError: Pin memory thread exited unexpectedly
Process finished with exit code 1
I just stumbled upon the persistent_worker
feature today when I thought about some ideas related to caching information in a custom dataset and couldn’t find any helpful leads so far. Did anyone experience this error before or has an idea where this could come from?
[1] examples/main.py at master · pytorch/examples · GitHub
Update: To verify that this error is specifically related to pinned memory, I also ran the script above with pin_mememory=False
when creating the data loader and the error did not occur. So there seems to be an erroneous interaction between pin_memory=True
and persistent_workers=True
.