Unhandled CUDA Error (1) v0.2

Traceback (most recent call last):
  File "main.py", line 201, in <module>
    loss_list, lr_epoch, mu_epoch = train(epoch)
  File "main.py", line 132, in train
    outputs = net(inputs)
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 59, in forward
    replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 64, in replicate
    return replicate(module, device_ids)
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
    param_copies = Broadcast(devices)(*params)
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 19, in forward
    outputs = comm.broadcast_coalesced(inputs, self.target_gpus)
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/cuda/comm.py", line 54, in broadcast_coalesced
    results = broadcast(_flatten_tensors(chunk), devices)
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/cuda/comm.py", line 24, in broadcast
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/cuda/nccl.py", line 190, in broadcast
    data_type, root, comm[i], cudaStream()))
  File "/home/lala/miniconda2/lib/python2.7/site-packages/torch/cuda/nccl.py", line 118, in check_error
    raise NcclError(status)
torch.cuda.nccl.NcclError: Unhandled Cuda Error (1)

When I run without GPU, the code is fine. On v0.1.12 it is fine on GPU and CPU.

Lines with issues I believe

if use_cuda:
    net = torch.nn.DataParallel(net, device_ids=range(torch.cuda.device_count()))
    cudnn.benchmark = True

this usually happens in some kind of CUDA libraries mismatch. I’ll follow up with you on the github issue: https://github.com/pytorch/pytorch/issues/2332

What is the current status of this issue? I install Pytorch using conda install and I am no longer able to parallelize my GPUs via torch.nn.DataParallel since v0.2 came out. My GPUs are both GTX 1080 Ti. The cards are the same make and model. DataParallel works without error when I install Pytorch via pip install but this is not an ideal solution as it causes other problems.

Yes @Soumith_Chintala Please please solve this issue. I am not able to install pytorch using pip either as remote server not supporting wheel. So, I have to use conda but not able to use DataParallel.