When using nn.DataParallel(model)
there seems to be an error cuda runtime error (711)
. I tried to search what is 711
but I couldn’t figure out. I am assuming that this is not a PyTorch
related but incase if anyone knows why this happens.
Without nn.DataParallel(model)
the model trains without any issue.
prediction = self._model(images)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 151, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim) if inputs else []
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter
res = scatter_map(inputs)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map
return list(zip(*map(scatter_map, obj)))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map
return Scatter.apply(target_gpus, None, dim, obj)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 89, in forward
outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams)
File "/opt/conda/lib/python3.7/site-packages/torch/cuda/comm.py", line 147, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: cuda runtime error (711) : peer mapping resources exhausted at /opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/THC/THCGeneral.cpp:136