RuntimeError: Expected all tensors to be on the same device, but found at least two devices, while using LayerNorm

While using PyTorch version 1.9.0, I’m getting the error saying that my tensors are at two different locations. Also, the error trace leads me to the LayerNorm function which has been assigned to the variable h. But when I check -

print(h.is_cuda),

it returns true. Therefore, I’m confused regarding what is causing this error and how to solve it.

File "C:/Users/user/AppData/Roaming/JetBrains/PyCharmCE2020.2/scratches/abc.py", line 206, in forward
    h = nn.LayerNorm(h.shape[1])(h)
  File "C:\Users\user\anaconda3\envs\paper_2\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\user\anaconda3\envs\paper_2\lib\site-packages\torch\nn\modules\normalization.py", line 174, in forward
    input, self.normalized_shape, self.weight, self.bias, self.eps)
  File "C:\Users\user\anaconda3\envs\paper_2\lib\site-packages\torch\nn\functional.py", line 2346, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking arugment for argument weight in method wrapper_native_layer_norm)

Update #1:

After following the stack trace, I reached the forward function in normalization.py and checked the variables present over there -

def forward(self, input: Tensor) -> Tensor:
            print("Foo")
            print("Check if weight is CUDA", self.weight.is_cuda)
            print("Check if bias is CUDA", self.bias.is_cuda)
            print("Check if input is CUDA", input.is_cuda)
            #print("Check if normalized shape is CUDA", self.normalized_shape.is_cuda)
            return F.layer_norm(
                input, self.normalized_shape, self.weight, self.bias, self.eps)

   

Check if weight is CUDA False
Check if bias is CUDA False
Check if input is CUDA True

Therefore, it is the weight and the biases within the layernorm function that is causing this issue. A quick hack done by me to get the function running was as follows. However, I am not sure whether is technique is appropriate -

h = h.to(device='cpu')
h = nn.LayerNorm(h.shape[1])(h)
h = h.to(device='cuda')

Here is a minimally reproducible example -

import math, random
from sklearn.datasets import load_sample_images
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import torch.autograd as autograd
import torch.nn.functional as F

###Obtaining a random image and preprocessing it!##
dataset = load_sample_images()
first_img_data = dataset.images[0]
first_img_data  = first_img_data.reshape(-1, 427, 640)
first_img_data = first_img_data[1, :, :]
first_img_data = first_img_data[0:84, 0:84].reshape(-1, 84,84)
first_img_data = torch.tensor(first_img_data)
#################################################################################################################################


USE_CUDA = torch.cuda.is_available()
Variable = lambda *args, **kwargs: autograd.Variable(*args, **kwargs).cuda() if USE_CUDA else autograd.Variable(*args, **kwargs)


class Cnn(nn.Module):
    def __init__(self, input_shape):
        super(Cnn, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4),
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=4, stride=2),
            nn.ReLU(),
            nn.Conv2d(64, 64, kernel_size=3, stride=1),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        # If you uncomment the line below, it'll throw an error!
        #x = nn.LayerNorm(x.shape[1])(x)
        return x

state = first_img_data
Shape = (1,84, 84)
current_model = Cnn(Shape)
current_model.to('cuda')
state   = Variable(torch.FloatTensor(np.float32(state)).unsqueeze(0), volatile=True)
q_value = current_model.forward(state)

P.S There is a similar question over here(pytorch running: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu), but I couldn’t obtain an answer by following the steps given.

1 Like

Could you post an executable code snippet to reproduce this issue, please?

I have extended the question. Please let me know if you still need an executable code snippet -

Yes, an executable code snippet would still be needed, as I cannot reproduce the issue via:

# CPU
input = torch.randn(20, 5, 10, 10)
m = nn.LayerNorm(input.size()[1:])
output = m(input)

# GPU
input = input.cuda()
m.cuda()
output = m(input)
1 Like

Hello ptrblck,

Thank you for your interest and help. I have included a minimally reproducible example for your reference. Please let me know if you need any more information.

I figured it out. The LayerNorm needs to be declared in the init_method, rather than the forward method.