Runtime Error: Tensors are on different GPUs - but I have only one GPU

parag2489 · October 4, 2017, 9:58pm

My error title and my issue are (probably) unrelated. The error is that the tensors are on different GPUs but as I said - I have only one of them. I managed to create a minimal example as follows.

Firstly, a code that produces above error:

import torch
import dataset
from torch.autograd import Variable

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        feat_layer_list = [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M']

        self.layers = []
        in_channels = 3
        for i in range(len(feat_layer_list)):
            if feat_layer_list[i] == 'M':
                self.layers += torch.nn.ModuleList([torch.nn.MaxPool2d(kernel_size=2, stride=2)])
            else:
                conv2d = torch.nn.Conv2d(in_channels, feat_layer_list[i], kernel_size=3, padding=1)
                self.layers += torch.nn.ModuleList([conv2d, torch.nn.ReLU(inplace=True)])
                in_channels = feat_layer_list[i]

    def forward(self, x):
        for i in range(len(self.layers)):  # include the module at the index layer_index
            x = self.layers[i](x)
        return x

new_model = Net()
new_model = new_model.cuda()

test_data_loader = dataset.test_loader('/data1/VisionDatasets/ImageNet_Fall2011/ILSVRC2012_grouped/test')
criterion = torch.nn.CrossEntropyLoss()

new_model.eval()
correct = 0
total = 0

for i, (batch, label) in enumerate(test_data_loader):
    batch = batch.cuda()
    output = new_model(Variable(batch))
    print "Done"

Now, by doing minimal modifications (basically using the nn.Sequential() module), my code runs, but I am curious to know why that happens.

import torch
import dataset
from torch.autograd import Variable

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        feat_layer_list = [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M']

        self.layers = []
        in_channels = 3
        for i in range(len(feat_layer_list)):
            if feat_layer_list[i] == 'M':
                self.layers += [torch.nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                conv2d = torch.nn.Conv2d(in_channels, feat_layer_list[i], kernel_size=3, padding=1)
                self.layers += [conv2d, torch.nn.ReLU(inplace=True)]
                in_channels = feat_layer_list[i]

        self.model = torch.nn.Sequential(*self.layers)

new_model = Net()
new_model.model = new_model.model.cuda()

test_data_loader = dataset.test_loader('/data1/VisionDatasets/ImageNet_Fall2011/ILSVRC2012_grouped/test')
criterion = torch.nn.CrossEntropyLoss()

new_model.model.eval()
correct = 0
total = 0

for i, (batch, label) in enumerate(test_data_loader):
    batch = batch.cuda()
    output = new_model.model(Variable(batch))
    print "Done"

What have I tried:

I searched for this error and found out that if I have two modules in a list, then I should used torch.nn.ModuleList(myList) I tried that and then the error changed to CUDNN_STATUS_MAPPING_ERROR (yes, I did new_model = new_model.cuda() after that). One more thing: by running the first code, you will get either of the two errors i.e. tensors are on different gpus or CUDNN status mapping error. I didn’t understand why that happens.

I dived into nn.Sequential() and saw if they are handling the modules (i.e. elements in the input argument list) any differently. I did not see anything different apart from the fact that I am keeping it simply as a list i.e. self.layers and using it in forward() as x = self.layers[i](x).

Though I have managed to solve the issue by using nn.Sequential(), I am curious as to what I am doing differently than nn.Sequential() that causes this error?

smth · October 11, 2017, 5:49am

i think this is a CPU/GPU error. The error message might be misleading. If some inputs are on GPU and other on CPU it will give the same message.