Runtime Error: tensors are on different GPUs


#1

Hi, I have encountered this problem.

Traceback (most recent call last):
File “main.py”, line 67, in
network.train()
File “/home/sp/text-classification-cnn/network/cnnTextNetwork.py”, line 115, in train
logit = self.model(feature)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “/home/sp/text-classification-cnn/model/cnnText/cnntext.py”, line 48, in forward
x = [F.relu(conv(x)).squeeze(3) for conv in self.convs1]
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py”, line 237, in forward
self.padding, self.dilation, self.groups)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py”, line 39, in conv2d
return f(input, weight, bias)
RuntimeError: tensors are on different GPUs

Even I specify the device in the env:

CUDA_VISIBLE_DEVICES=0 python main.py

and it doesn’t work.

Anyone can help me out of this…
Thanks.


(Yun Chen) #2

It seems that the models’ parameters and the input are in different GPUs, It would be better if you could provide more information, like the definition of self.model and how do you process input.


#3

Thanks for your reply.
This is the code structure I used. There are two GPUs but I indeed specify to use one of them.

Note: There are embedding layers in the model.

class cnnTextNetwork(Configurable):
  def __init__(self, option, model, *args, **cargs):
    # Other Operations

    if self.use_gpu:
      self.model = model(self.args).cuda()
    else:
      self.model = model(self.args)
    return

  def train(self):
    # Other Operations
    while True:
      for batch in self.train_minibatch():
        self.model.train()
        feature, target = batch['text'], batch['label']
        if self.use_gpu:
          feature = Variable(torch.from_numpy(feature).cuda())
        else:
          feature = Variable(torch.from_numpy(feature))
        optimizer.zero_grad() 
        logit = self.model(feature)
        # Other Operations

(Yun Chen) #4

I guess self.conv1 is a list

File "/home/sp/text-classification-cnn/model/cnnText/cnntext.py", line 48, in forward
x = [F.relu(conv(x)).squeeze(3) for conv in self.convs1]

it seems that it has to be parameter or nn.Module so that when you excute model.cuda it will be transport to GPU.
So try:

class MyModel(nn.Module):
    def __init__(self):
          ......
         self.convs1_1 = nn.Conv2d()
         self.convs1_2 = nn.Conv23()
  
   def forward(self,input):
        ...
       results=[]
       results.appned(self.convs1_1(x))
       results.append(self.convs1_2(x))
.....

#5

Yes. In the init of the model, the definition is as followed.

self.convs1 = [nn.Conv2d(input_channel, output_channel, (K, words_dim), padding=(K-1, 0)) for K in Ks]


(Yun Chen) #6

or add this in forward:

def forward(self,input):
    if self.use_gpu:
         self.conv1s = [model.cuda() for model in self.conv1s]

#7

It works. Thanks very much.


(James Bradbury) #8

Use nn.ModuleList.


(Yun Chen) #9

I can’t delete the post,

Using nn.ModuleList is the right way


(Bruce) #10

Hi @chenyuntc, Could you tell me more about using nn.ModuleList? I also get a similar error when I only use CPU.

File “/home/zli/WorkSpace/PyWork/Terraref/panicle_detection/faster_rcnn/network.py”, line 16, in forward
x = self.conv(x)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py”, line 206, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/modules/conv.py”, line 237, in forward
self.padding, self.dilation, self.groups)
File “/usr/local/lib/python2.7/dist-packages/torch/nn/functional.py”, line 40, in conv2d
return f(input, weight, bias)
RuntimeError: tensors are on different GPUs


(Yun Chen) #11

Only Use CPU?
Do you load pretrained model?


(Rahul Singh) #12

Hi, can you please elaborate on how to use nn.ModuleList to get rid of the “RuntimeError: tensors are on different GPUs” error?