DataParallel with scalar loss: dimension specified as 0 but tensor has no dimensions

I have a model that works fine with single GPU but when I want to use DataParallel I get this error:

  File "mmod/runtorch.py", line 224, in train_batch
    loss = sum(model(data, labels))
  File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
    return self.gather(outputs, self.output_device)
  File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    return gather_map(outputs)
  File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
    return Gather.apply(target_device, dim, *outputs)
  File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
    ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
  File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in <lambda>
    ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions

If I return loss.unsqueeze(dim=0) then error message will become:

  File "/opt/conda/lib/python2.7/site-packages/torch/autograd/__init__.py", line 27, in _make_grads
    raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar output    

I know of this and this issue, but is there any workaround or I have to update from PyTorch 0.4 to master?

Please note that I have multiple losses (returned from model.forward() is a tuple of 0-dim loss value tensors) for different parts of the last layer, and therefore I use sum() and apply backward on it.

Thanks to the workaround here:

Instead of returning a tuple of 0-dim tensors for loss:

    return tuple(loss_list)

if I return:

    return torch.stack(loss_list).squeeze()

Everything works.

2 Likes