I try to train inception_v3 in a multi-gpu environment, but it failed like:
Traceback (most recent call last):
File "/home/ljy/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-22-139a04d1be98>", line 1, in <module>
model = torch.nn.parallel.DistributedDataParallel(model)
File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 103, in __init__
dist.broadcast(p, 0)
File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/distributed/__init__.py", line 197, in broadcast
"collective only supported in process-group mode"
AssertionError: collective only supported in process-group mode
The code is as follows (pytorch 0.3.1):
import torchvision.models as models
model = models.inception_v3()
model.cuda()
model = torch.nn.parallel.DistributedDataParallel(model)