Multiple-GPU Error - Data Parallel

Hi there, I’m trying to run my code across multiple GPU’s and am getting the following error:
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

I’ve seen a few posts around here and on https://github.com/pytorch/pytorch/, but nothing seems to be of use for me. I’m using a pre-trained model from https://github.com/osmr/imgclsmob, and have modified the forward function to return the activations as well as output. Here’s a simplified version of my code:

from pytorchcv.model_provider import get_model as ptcv_get_model
import torch
import types
net = ptcv_get_model("densenet40_k12_cifar10", root = 'loc', pretrained=True)
def my_forward(self, x):
    activations = []
    for module in self.features._modules.values():
        x = module(x) #error happens here
        activations.append(x)
    x = x.view(x.size(0), -1)
    x = self.output(x)
    return x, outs

net.forward = types.MethodType(my_forward, net)

if torch.cuda.device_count() > 1:
    net = nn.Dataparallel(net, device_ids=[0,1,2,3]
net.to(device)
net.eval()

And my full error message is:

Traceback (most recent call last):
  File "main.py", line 471, in <module>
    train_student(student, teach)
  File "main.py", line 155, in train_student
    outputs_teacher, ints_teacher = teach(inputs)
  File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
    raise output
  File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker
    output = module(*input, **kwargs)
  File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "main.py", line 364, in my_forward
    x = module_val(x)
  File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/s1874193/miniconda3/envs/distill/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

Any suggestions would be really appreciated. Thanks!

Do you create any tensors, parameters or modules on-the-fly in your forward method?
Could you post a code snippet to reproduce this error, so that we could have a look?

Thanks @ptrblck. This reproduces the error for me:

import os
from pytorchcv.model_provider import get_model as ptcv_get_model
import torch
import types
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

os.environ["CUDA_VISIBLE_DEVICES"] = '0,1,2,3'
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

cifar_loc = '/disk/scratch/s1874193/datasets/cifar'

net = ptcv_get_model("densenet40_k12_cifar10", root = '/home/s1874193/Distillation/xdistill/pre_trained_models', pretrained=True)
def my_forward(self, x):
    activations = []
    for module in self.features._modules.values():
        x = module(x)
        activations.append(x)
    x = x.view(x.size(0), -1)
    x = self.output(x)
    return x, activations

net.forward = types.MethodType(my_forward, net)

if torch.cuda.device_count() > 1:
    net = nn.DataParallel(net, device_ids=[0,1,2,3])
net.to(device)
net.eval()

x = torch.randn(4, 3, 32, 32)
out, act = net(x)


I’m not doing anything with the forward() method other that what you can see here. I think it’s somehow related to how I’m using CIFAR, as I didn’t get the error just doing

x = torch.randn(1, 3, 32, 32)
out, activations = net(x)

Thanks!

I’m not sure about the conclusion.
Try to pass more than a single sample and you should see the same error:

x = torch.randn(4, 3, 32, 32)
out, act = net(x)

I’ll try to dig into it a bit later.

Yes, you’re right with that. Thank you! I’ll keep trying to get somewhere myself. I’ve edited the OP to clean things up using

x = torch.randn(4, 3, 32, 32)
out, act = net(x)```

UPDATE: I managed to fix this by passing my model through a new class with my edited forward function, instead of using types.MethodType. Also important to not put anything .to(device) inside the fprop, as it reallocates the device after DataParallel has done its thing.

os.environ["CUDA_VISIBLE_DEVICES"] = '0,1,2,3'
device = torch.device(torch.cuda.current_device() if torch.cuda.is_available() else "cpu")

net = ptcv_get_model("densenet40_k12_cifar10", pretrained=True)

class ReturnLayers(nn.Module):
    def __init__(self, model):
        super(ReturnLayers, self).__init__()
        self.model = model

    def forward(self, x):
        activations = []
        for module in self.model.features._modules.values():
            x = module(x)
            activations.append(x)
        x = x.view(x.size(0), -1)
        x = self.model.output(x)
        return x, activations

net = ReturnLayers(net).to(device)

if torch.cuda.device_count() > 1:
    net = nn.DataParallel(net)

net.eval()
x = torch.randn(4, 3, 32, 32)
out, act = net(x)

1 Like

I’m suffering similar problem likes you.
So your conclusion is,

do not modify network’s method(e.g. forward) with types.MethodType when you are going to use nn.DataParallel.

right?

So you just create new class instead of modifying existing one.

But I still want to find a way to parallel revised model from existing one, not creating new model architecture with copying code.
Is there good way to solve this?