How to solve the problem of `RuntimeError: all tensors must be on devices[0]`


for i, (input, target) in enumerate(test_loader):
	target = target.cuda(async=True) # in test loader, pin_memory = True
	input_var = torch.autograd.Variable(input, volatile=False)

	# (Batch_Size, 10L, 3L, 32L, 224L, 224L)
	b, s, c, t, h, w = input_var.size()
	# view in (Batch_Size * 10L, 3L, 32L, 224L, 224L)
	input_var = input_var.view(-1, c, t, h, w)
	# forward
	output = model(input_var)
	# split in (Batch_Size, 10L, 400L)
	output = output.view(b, s, args.num_classes)
	# softmax
	scores = torch.sum(F.softmax(output, dim=2), dim=1, keepdim=False)
	# in-place average scores
	scores, indices = scores.div_(10).sort(dim=1, descending=True)


Traceback (most recent call last):
  File "", line 301, in <module>
  File "", line 164, in main
    inference(test_loader, model)
  File "", line 195, in inference
    output = model(input_var)
  File "/home/anaconda2/lib/python2.7/site-packages/torch/nn/modules/", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/", line 73, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/", line 83, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/", line 67, in parallel_apply
    raise output
RuntimeError: all tensors must be on devices[0]


1 Like


When using DataParallel, with a list of gpu like [dev0, dev1, ...], all the inputs that you give to the module have to be on dev0.
You want to make sure that the current device when you send your data to the gpu is dev0.

how to manually send data to the gou dev0 ?

1 Like

Either set the current device manually torch.cuda.set_device(dev0) or use the context manager as follow to not change the current device globally:

with torch.cuda.device(dev0):
    t = t.cuda()

Is this right? I still encounter the error.

model = torch.nn.DataParallel(model.cuda(), device_ids=[0,1,2,3])

for i, (input, target) in enumerate(test_loader):
        with torch.cuda.device(0):
	        target = target.cuda(async=True) # in test loader, pin_memory = True
	        input_var = torch.autograd.Variable(input, volatile=False)

For the target yes, is your input already on the gpu?

I ref this code which directly convert input from FloatTensor to Variable, not from CudaFloatTensor. So should I write code like this:

for i, (input, target) in enumerate(test_loader):
        with torch.cuda.device(0):
	        target = target.cuda(async=True) # in test loader, pin_memory = True
	        input = input.cuda(async=True)

        target_var = torch.autograd.Variable(target, volatile=False)
        input_var = torch.autograd.Variable(input, volatile=False)

Yes, it should look like this !

1 Like

why do all the inputs have to be on dev0?

That’s a convention used by DataParallel. It performs the split and copy to other device by himself. The user does not know how the data will be splitted, so you cannot copy the data to the other devices in advance.


Thank you very much for your answer!

Hello,I meet a question about DataParallel.I want to use the GPU on the server,which ID is 0 and 3.

Traceback (most recent call last):
File “”, line 67, in
File “”, line 19, in train
File “/home/wangxu/miniconda3/envs/pt2/lib/python2.7/site-packages/torch/nn/parallel/”, line 102, in init
File “/home/wangxu/miniconda3/envs/pt2/lib/python2.7/site-packages/torch/nn/parallel/”, line 17, in _check_balance
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File “/home/wangxu/miniconda3/envs/pt2/lib/python2.7/site-packages/torch/cuda/”, line 292, in get_device_properties
raise AssertionError(“Invalid device id”)
AssertionError: Invalid device id

And, calculator = device(‘cuda’)
When I change device_ids to [0],the code work.
So I am still confused about how to use DataParallel.The offical example on github doesn’t help.

Hi, in your bash, your command should like:


and in your, the gpus config should be set [0,1].


Thank you for your reply.
I change the code by following your advice and get a new error:

/home/wangxu/miniconda3/envs/pt2/lib/python2.7/site-packages/torch/nn/parallel/ UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 0 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
Traceback (most recent call last):
File “”, line 130, in
File “”, line 95, in train
loss1_1 = loss(out1_1 * mask_var, vecmap_var)
File “/home/wangxu/miniconda3/envs/pt2/lib/python2.7/site-packages/torch/nn/modules/”, line 491, in call
result = self.forward(*input, **kwargs)
File “/home/wangxu/miniconda3/envs/pt2/lib/python2.7/site-packages/torch/nn/modules/”, line 372, in forward
return F.mse_loss(input, target, size_average=self.size_average, reduce=self.reduce)
File “/home/wangxu/miniconda3/envs/pt2/lib/python2.7/site-packages/torch/nn/”, line 1569, in mse_loss
input, target, size_average, reduce)
File “/home/wangxu/miniconda3/envs/pt2/lib/python2.7/site-packages/torch/nn/”, line 1537, in _pointwise_loss
return lambd_optimized(input, target, size_average, reduce)
RuntimeError: after cudaLaunch in triple_chevron_launcher::launch(): out of memory

Is someone else using the GPU or does the server have different cards installed?
Could you explain a bit about the setup.

Might be unrelated here, but sometimes the device_ids don’t match with what nvidia_smi claims.

I was watching the nvidia-smi per 3 seconds.It seems no one was using the GPU ID 0 & 3 .
On the server there are 8 GPU installed and they are all the same type 1080Ti.
What else do you want to know?

Could you add CUDA_DEVICE_ORDER=PCI_BUS_ID before in front of your Python call?
Just curious, if the device ids are differently assigned.

Something doesn’t seem right.
The _check_balance method only checks for the GPU specs (total memory and multi processor count).
So if all cards are 1080Tis the warning shouldn’t come.

In your first attempt to use DataParallel you used device_ids=[0, 3] and got “Invalid device id” back.

Are you sure you are working on the right server?

What do you mean by the right server?I mean that if I don’t use the right server,I should not sing in successfully.
And I don’t konw what do the error and warning mean.
By the way, I set the input like this:
calculator = device(‘cuda’)
input_var =

I have a test,using the offcial example/mnist.
Change the code this way:

model = Net().to(device)
model = Net()
model=nn.DataParallel(model, device_ids=[0, 1]).to(device)

Then run :CUDA_VISIBLE_DEVICES=0,1 python
The same warning came out appear.