Hello,
I am trying to use DataParallel to learn my model on multiple GPUs. The forward function of my model is returning a dict() with two keys “predicted_tags” and “likelihood” (it’s basically an NER model) and I thought this use case was supported but I am getting the following error
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 163, in forward
return self.gather(outputs, self.output_device)
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 175, in gather
return gather(outputs, output_device, dim=self.dim)
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 73, in gather
res = gather_map(outputs)
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather_map
raise(e)
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 64, in gather_map
for k in out))
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 64, in <genexpr>
for k in out))
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather_map
raise(e)
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 65, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather_map
raise(e)
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 65, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather_map
raise(e)
File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 65, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration
Note that I am not getting this error if I don’t set the value of predicted_tags and return for instance
{ predicted_tags: None, loss: sth }
Do you have any idea ? I added some torch.save by looking at the stack trace and all that I can say is that in data_parallel.py
, before running return self.gather(outputs, self.output_device)
outputs is still the correct dict, e.g.
[
{'predicted_tags': [[0, 0, 0], [0, 0], [0, 0, 0, 0]], 'loss': 0},
{'predicted_tags': [[0, 0, 0, 0, 0]], 'loss': 0}
]
but later in the process just before the final exception, outputs is just (0, 0)
. Hope that these details help.