RuntimeError: received 0 items of ancdata

Traceback (most recent call last):
  File "process.py", line 73, in <module>
    eval('../amazon/densenet169_new',128,256,2)
  File "/home/zhenghuabin/jianglibin/amazon/eval.py", line 49, in eval
    for step, (data, target) in enumerate(validate_loader):
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 206, in __next__
    idx, batch = self.data_queue.get()
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/multiprocessing/queues.py", line 345, in get
    return ForkingPickler.loads(res)
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/multiprocessing/reduction.py", line 181, in recv_handle
    return recvfds(s, 1)[0]
  File "/home/zhenghuabin/anaconda3/envs/py35/lib/python3.5/multiprocessing/reduction.py", line 160, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata

Here is my code:

dset_validate = AmazonDateset_validate(validate_index)
validate_loader = DataLoader(dset_validate, batch_size=batch_size, num_workers=4)
for step, (data, target) in enumerate(validate_loader):

It always get error in the same position(finish some particular epoch). How to solve it?

1 Like

This problem happen in some particular machine, I mean it happens on one of my machines while the same code works well on other servers, and I can get rid by “num_workers=0”, but it makes my training too slow.

3 Likes

@Kyle - Check this github issue (https://github.com/pytorch/pytorch/issues/973) and see towards the end for the potential fix - increasing ulimit

2 Likes

torch.multiprocessing.set_sharing_strategy('file_system')
Using this solved for me.

12 Likes

Most of the solutions here deal with the symptoms, but not the cause.

This link explains the cause (a memory leak when storing to much data from a dataloader) and how to solve it.

I copy the code from the aforementioned link:

pred_list = []
target_list = []
# long version
for inputs, targets in DataLoader(dataset, num_workers=6, batch_size=64):
    pred_list.append(model.predict_on_batch(inputs))  # make model prediction
    targets_copy = deepcopy(targets)
    target_list.append(targets_copy)
    del inputs
    del targets
3 Likes

set “num_workers=0” fixes my issue. Thank you.

1 Like

With larger test sets not using multiprocessing may be a lot slower.
Btorb’s solution the better solution, the essential part here is to deepcopy the targets generated.

from copy import deepcopy

This worked for me. Any idea why this is needed?