What will you check when your code fail on multiple gpus

I mean, after adding a new layer to cnn, everything runs normally on single GPU. But after setting multiple GPU visible, I got some error like:

Traceback (most recent call last):
File “train3_bilinear_pooling.py”, line 400, in
run()
File “train3_bilinear_pooling.py”, line 219, in run
train(train_loader, model, criterion, optimizer, epoch)
File “train3_bilinear_pooling.py”, line 326, in train
return _each_epoch(‘train’, train_loader, model, criterion, optimizer, epoch)
File “train3_bilinear_pooling.py”, line 270, in _each_epoch
output = model(input_var)
File “/home/member/fuwang/opt/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 319, in call
result = self.forward(*input, **kwargs)
File “/home/member/fuwang/opt/anaconda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py”, line 67, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File “/home/member/fuwang/opt/anaconda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py”, line 72, in replicate
return replicate(module, device_ids)
File “/home/member/fuwang/opt/anaconda/lib/python3.6/site-packages/torch/nn/parallel/replicate.py”, line 19, in replicate
buffer_copies = comm.broadcast_coalesced(buffers, devices)
File “/home/member/fuwang/opt/anaconda/lib/python3.6/site-packages/torch/cuda/comm.py”, line 55, in broadcast_coalesced
for chunk in _take_tensors(tensors, buffer_size):
File “/home/member/fuwang/opt/anaconda/lib/python3.6/site-packages/torch/_utils.py”, line 232, in _take_tensors
if tensor.is_sparse:
File “/home/member/fuwang/opt/anaconda/lib/python3.6/site-packages/torch/autograd/variable.py”, line 68, in getattr
return object.getattribute(self, name)
AttributeError: ‘Variable’ object has no attribute ‘is_sparse’

By the way, I am using model = nn.DataParallel(model).cuda()

Did you set a buffer as a Variable? It should be a tensor.

1 Like

Thank you. That’s the problem. And the problem was just solved by the original author of pytorch_compact_bilinear_pooling.