RuntimeError: received 0 items of ancdata

Kyle · July 18, 2017, 2:27pm

Traceback (most recent call last):
  File "process.py", line 73, in <module>
    eval('../amazon/densenet169_new',128,256,2)
  File "/home/zhenghuabin/jianglibin/amazon/eval.py", line 49, in eval
    for step, (data, target) in enumerate(validate_loader):
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 206, in __next__
    idx, batch = self.data_queue.get()
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/multiprocessing/queues.py", line 345, in get
    return ForkingPickler.loads(res)
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/multiprocessing/reduction.py", line 181, in recv_handle
    return recvfds(s, 1)[0]
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/multiprocessing/reduction.py", line 160, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata

Here is my code:

dset_validate = AmazonDateset_validate(validate_index)
validate_loader = DataLoader(dset_validate, batch_size=batch_size, num_workers=4)
for step, (data, target) in enumerate(validate_loader):

It always get error in the same position(finish some particular epoch). How to solve it?

Kyle · July 18, 2017, 2:48pm

This problem happen in some particular machine, I mean it happens on one of my machines while the same code works well on other servers, and I can get rid by “num_workers=0”, but it makes my training too slow.

sampathweb · November 16, 2017, 10:55pm

@Kyle - Check this github issue (https://github.com/pytorch/pytorch/issues/973) and see towards the end for the potential fix - increasing ulimit

Swamy · September 16, 2019, 9:13pm

torch.multiprocessing.set_sharing_strategy('file_system')
Using this solved for me.

btorb · November 4, 2019, 1:18pm

Most of the solutions here deal with the symptoms, but not the cause.

This link explains the cause (a memory leak when storing to much data from a dataloader) and how to solve it.

I copy the code from the aforementioned link:

pred_list = []
target_list = []
# long version
for inputs, targets in DataLoader(dataset, num_workers=6, batch_size=64):
    pred_list.append(model.predict_on_batch(inputs))  # make model prediction
    targets_copy = deepcopy(targets)
    target_list.append(targets_copy)
    del inputs
    del targets

Harry_Liu · January 24, 2020, 4:04pm

set “num_workers=0” fixes my issue. Thank you.

Ludwig_Friborg · March 31, 2020, 11:50am

With larger test sets not using multiprocessing may be a lot slower.
Btorb’s solution the better solution, the essential part here is to deepcopy the targets generated.

from copy import deepcopy

Zador · January 8, 2022, 3:21pm

This worked for me. Any idea why this is needed?

Kris_Man · November 6, 2022, 5:58pm

Yes!!! It works for me