Issue with DataParallel and module returning dict in forward

Thomas_Ricatte · March 31, 2021, 1:38pm

Hello,

I am trying to use DataParallel to learn my model on multiple GPUs. The forward function of my model is returning a dict() with two keys “predicted_tags” and “likelihood” (it’s basically an NER model) and I thought this use case was supported but I am getting the following error

  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 163, in forward
    return self.gather(outputs, self.output_device)
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 175, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 73, in gather
    res = gather_map(outputs)
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather_map
    raise(e)
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 64, in gather_map
    for k in out))
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 64, in <genexpr>
    for k in out))
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather_map
    raise(e)
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 65, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather_map
    raise(e)
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 65, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather_map
    raise(e)
  File "/mnt/nfs/home/foo/myenv/lib64/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 65, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration

Note that I am not getting this error if I don’t set the value of predicted_tags and return for instance

{ predicted_tags: None, loss: sth }

Do you have any idea ? I added some torch.save by looking at the stack trace and all that I can say is that in data_parallel.py, before running return self.gather(outputs, self.output_device) outputs is still the correct dict, e.g.

[
{'predicted_tags': [[0, 0, 0], [0, 0], [0, 0, 0, 0]], 'loss': 0},
{'predicted_tags': [[0, 0, 0, 0, 0]], 'loss': 0}
]

but later in the process just before the final exception, outputs is just (0, 0). Hope that these details help.

samin_hamidi · January 29, 2022, 7:23pm

I am facing the same issue. Have you been able to solve the problem?