I face this when using torch.nn.parallel.DistributedDataParallel(pytorch 1.4.0), and also using below
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
tensor = torch.zeros(*shape, device=device).scatter_add(1, segment_ids, data)
File “/home/gezi/mine/pikachu/utils/melt/eager/train.py”, line 1398, in train
loss.backward()
File “/home/gezi/env/anaconda3/lib/python3.6/site-packages/torch/tensor.py”, line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/home/gezi/env/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py”, line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: Tensors must be CUDA and dense
How to slove this ? I tried many such as
tensor = torch.zeros(*shape).cuda().scatter_add(1, segment_ids, data)
but this only works for DataParallel not DistributedDataParallel.
Another problem of DIstributedDataParallel is each process using all gpus like below, is this by design ?