I have a model that works fine with single GPU but when I want to use DataParallel
I get this error:
File "mmod/runtorch.py", line 224, in train_batch
loss = sum(model(data, labels))
File "/opt/conda/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
return self.gather(outputs, self.output_device)
File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
return gather(outputs, output_device, dim=self.dim)
File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
return gather_map(outputs)
File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
File "/opt/conda/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in <lambda>
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions
If I return loss.unsqueeze(dim=0)
then error message will become:
File "/opt/conda/lib/python2.7/site-packages/torch/autograd/__init__.py", line 27, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar output
I know of this and this issue, but is there any workaround or I have to update from PyTorch 0.4 to master?
Please note that I have multiple losses (returned from model.forward()
is a tuple of 0-dim loss value tensors) for different parts of the last layer, and therefore I use sum()
and apply backward on it.