Error with multiple GPUs: runtime error: nccl error 2: unhandled system error

When I run my code with multiple GPUs, it crashes occasionally with the following error:

File "main.py", line 132, in train
    model.train(train_loader, val_loader)
  File "/mnt/DATA/code/bitbucket/drn_seg/segment/seg_model.py", line 54, in train
    self.train_epoch(epoch, train_loader, val_loader)
  File "/mnt/DATA/code/bitbucket/drn_seg/segment/seg_model.py", line 93, in train_epoch
    output = self.model_seg(image)[0]
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 113, in forward
    replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/data_parallel.py", line 118, in replicate
    return replicate(module, device_ids)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/replicate.py", line 12, in replicate
    param_copies = Broadcast.apply(devices, *params)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/_functions.py", line 17, in forward
    outputs = comm.broadcast_coalesced(inputs, ctx.target_gpus)
  File "/usr/local/lib/python3.5/dist-packages/torch/cuda/comm.py", line 40, in broadcast_coalesced
    return torch._C._broadcast_coalesced(tensors, devices, buffer_size)
RuntimeError: NCCL Error 2: unhandled system error

Hi could you figure out the problem?