My error title and my issue are (probably) unrelated. The error is that the tensors are on different GPUs but as I said - I have only one of them. I managed to create a minimal example as follows.
Firstly, a code that produces above error:
import torch
import dataset
from torch.autograd import Variable
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
feat_layer_list = [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M']
self.layers = []
in_channels = 3
for i in range(len(feat_layer_list)):
if feat_layer_list[i] == 'M':
self.layers += torch.nn.ModuleList([torch.nn.MaxPool2d(kernel_size=2, stride=2)])
else:
conv2d = torch.nn.Conv2d(in_channels, feat_layer_list[i], kernel_size=3, padding=1)
self.layers += torch.nn.ModuleList([conv2d, torch.nn.ReLU(inplace=True)])
in_channels = feat_layer_list[i]
def forward(self, x):
for i in range(len(self.layers)): # include the module at the index layer_index
x = self.layers[i](x)
return x
new_model = Net()
new_model = new_model.cuda()
test_data_loader = dataset.test_loader('/data1/VisionDatasets/ImageNet_Fall2011/ILSVRC2012_grouped/test')
criterion = torch.nn.CrossEntropyLoss()
new_model.eval()
correct = 0
total = 0
for i, (batch, label) in enumerate(test_data_loader):
batch = batch.cuda()
output = new_model(Variable(batch))
print "Done"
Now, by doing minimal modifications (basically using the nn.Sequential()
module), my code runs, but I am curious to know why that happens.
import torch
import dataset
from torch.autograd import Variable
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
feat_layer_list = [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M']
self.layers = []
in_channels = 3
for i in range(len(feat_layer_list)):
if feat_layer_list[i] == 'M':
self.layers += [torch.nn.MaxPool2d(kernel_size=2, stride=2)]
else:
conv2d = torch.nn.Conv2d(in_channels, feat_layer_list[i], kernel_size=3, padding=1)
self.layers += [conv2d, torch.nn.ReLU(inplace=True)]
in_channels = feat_layer_list[i]
self.model = torch.nn.Sequential(*self.layers)
new_model = Net()
new_model.model = new_model.model.cuda()
test_data_loader = dataset.test_loader('/data1/VisionDatasets/ImageNet_Fall2011/ILSVRC2012_grouped/test')
criterion = torch.nn.CrossEntropyLoss()
new_model.model.eval()
correct = 0
total = 0
for i, (batch, label) in enumerate(test_data_loader):
batch = batch.cuda()
output = new_model.model(Variable(batch))
print "Done"
What have I tried:
I searched for this error and found out that if I have two modules in a list
, then I should used torch.nn.ModuleList(myList)
I tried that and then the error changed to CUDNN_STATUS_MAPPING_ERROR
(yes, I did new_model = new_model.cuda()
after that). One more thing: by running the first code, you will get either of the two errors i.e. tensors are on different gpus or CUDNN status mapping error. I didn’t understand why that happens.
I dived into nn.Sequential()
and saw if they are handling the modules (i.e. elements in the input argument list) any differently. I did not see anything different apart from the fact that I am keeping it simply as a list i.e. self.layers
and using it in forward()
as x = self.layers[i](x)
.
Though I have managed to solve the issue by using nn.Sequential()
, I am curious as to what I am doing differently than nn.Sequential()
that causes this error?