RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)

Tushar_Gautam · December 7, 2020, 5:38pm

Hi

I defined a ResNet as follows

# Residual Block
class DenseResidual(torch.nn.Module):
    def __init__(self, inp_dim, neurons, layers, **kwargs):
        super(DenseResidual, self).__init__(**kwargs)
        self.h1 = torch.nn.Linear(inp_dim, neurons)
        self.hidden = [torch.nn.Linear(neurons, neurons) 
                      for _ in range(layers-1)]
        
    def forward(self, inputs):
        h = torch.tanh(self.h1(inputs))
        x = h
        for layer in self.hidden:
            x = torch.tanh(layer(x))
            
        # Defining Residual Connection and returning
        return x + h

# ResNet Architecture
class MyResNet(torch.nn.Module):
    def __init__(self, **kwargs):
        super(MyResNet, self).__init__(**kwargs)
        self.b1 = DenseResidual(2, 8, 3)
        self.b2 = DenseResidual(8, 16, 3)
        self.hn = torch.nn.Linear(16, 8)
        self.out = torch.nn.Linear(8, 1)
        
    def forward(self, inputs):
        x = self.b1(inputs)
        x = self.b2(x)
        x = torch.tanh(self.hn(x))
        x = self.out(x)
        return x

model = MyResNet()

When I run the forward pass using code

model.to(device)
optimizer = torch.optim.Adam(model.parameters())
criterion = torch.nn.MSELoss()

EPOCHS = 5

for epoch in range(EPOCHS):
    optimizer.zero_grad()
    
    train_m.requires_grad = True
    p = model(train_m)
    print(p)

I get an error message

RuntimeError                              Traceback (most recent call last)
<ipython-input-22-cf2450a381be> in <module>
      9 
     10     train_m.requires_grad = True
---> 11     p = model(train_m)
     12     print(p)
     13 

~\miniconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-20-e99f45da034d> in forward(self, inputs)
      9 
     10     def forward(self, inputs):
---> 11         x = self.b1(inputs)
     12         x = self.b2(x)
     13         x = torch.tanh(self.hn(x))

~\miniconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

<ipython-input-14-9481614a812a> in forward(self, inputs)
     11         x = h
     12         for layer in self.hidden:
---> 13             x = torch.tanh(layer(x))
     14 
     15         # Defining Residual Connection and returning

~\miniconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~\miniconda3\envs\torch\lib\site-packages\torch\nn\modules\linear.py in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

~\miniconda3\envs\torch\lib\site-packages\torch\nn\functional.py in linear(input, weight, bias)
   1688     if input.dim() == 2 and bias is not None:
   1689         # fused op is marginally faster
-> 1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
   1692         output = input.matmul(weight.t())

RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)

As I’ve moved the model to gpu, I can’t understand why this is happening?

ptrblck · December 8, 2020, 5:47am

Did you also move the input data to the GPU?
Note that you would have to reassign tensors (unlike modules):

tensor = tensor.to('cuda:0') # needs assignment
model.to('cuda:0') # works without

ojaswital · December 8, 2020, 11:01pm

I have been getting the exact same error. Even with the reassignment. Could there be any other issue?

ptrblck · December 8, 2020, 11:09pm

Could you post a code snippet to reproduce this issue, so that we can have a look?

QUE · December 9, 2020, 2:57pm

I got the same error as yours

Tushar_Gautam · December 9, 2020, 3:03pm

Hi, I found the solution to this problem. I forgot to use ModuleList in class defining Residual Block. When I added it, the code ran perfectly. Here’s the modified code:

# Residual Block
class DenseResidual(torch.nn.Module):
    def __init__(self, inp_dim, neurons, layers, **kwargs):
        super(DenseResidual, self).__init__(**kwargs)
        self.h1 = torch.nn.Linear(inp_dim, neurons)
        self.hidden = [torch.nn.Linear(neurons, neurons) 
                      for _ in range(layers-1)]
        # Using ModuleList so that this layer list can be moved to CUDA                      
        self.hidden = torch.nn.ModuleList(self.hidden)
        
    def forward(self, inputs):
        h = torch.tanh(self.h1(inputs))
        x = h
        for layer in self.hidden:
            x = torch.tanh(layer(x))
            
        # Defining Residual Connection and returning
        return x + h

kun_wang · December 28, 2020, 8:49am

I got the same bug as yours. But I have use the ModuleList.Have you solved the problem?

Tushar_Gautam · December 28, 2020, 8:50am

Yes, I solved my problem using module list.

kun_wang · December 28, 2020, 8:55am

def clones(module, N):
    "Produce N identical layers."
    return nn.ModuleList([copy.deepcopy(module) for _ in range(N)])

class Encoder(nn.Module):
    "Core encoder is a stack of N layers"
    def __init__(self, layer, N):
        # layer = one EncoderLayer object, N=6
        super(Encoder, self).__init__()
        self.layers = clones(layer, N)
        
        # 深copy，N=6，
        self.norm = LayerNorm(layer.size)
  

    def forward(self, x, mask):
        "Pass the input (and mask) through each layer in turn."
        # x is alike (30, 10, 512)
        # (batch.size, sequence.len, d_model)
        # mask是类似于(batch.size, 10, 10)的矩阵
        for layer in self.layers:
            x = layer(x, mask)
           
        return self.norm(x)

class EncoderLayer(nn.Module):
    "Encoder is made up of self-attn and "
    "feed forward (defined below)"
    def __init__(self, size, self_attn, feed_forward, dropout):

        super(EncoderLayer, self).__init__()
        self.self_attn = self_attn
        self.feed_forward = feed_forward
        self.sublayer = clones(SublayerConnection(size, dropout), 2)
        # 使用深度克隆方法，完整地复制出来两个SublayerConnection
        self.size = size # 512

    def forward(self, x, mask):
        "Follow Figure 1 (left) for connections."
        # x shape = (batch_size, sequence_length, d_model)
        # mask 是(batch.size, sequence_length, sequence_length)的矩阵，类似于当前一个词w，有哪些词是w可见的
        x = self.sublayer[0](x,
          lambda x: self.self_attn(x, x, x, mask))
        # x (batch_size, sequence_length, d_model) -> self_attn (MultiHeadAttention)
        # shape is same (batch_size, sequence_length, d_model) -> SublayerConnection
        # -> (batch_size, sequence_length, d_model)
        return self.sublayer[1](x, self.feed_forward)
        # x 和feed_forward对象一起，给第二个SublayerConnection

class Encoder_model(nn.Module):
    def __init__(self, encoder, src_embed, generator):
        super(Encoder_model, self).__init__()
        self.encoder = encoder
        self.src_embed = src_embed
        self.generator = generator
        
    def forward(self, src, src_mask):
        out = self.encoder(self.src_embed(src), src_mask)
        out = self.generator(out)
        return out

def make_Encoder_model(src_vocab, N=6, d_model=64, d_ff=512, h=8, input_dim=168, out_dim = 24, dropout=0):
    c = copy.deepcopy
    attn = MultiHeadedAttention(h, d_model)
    ff = PositionwiseFeedForward(d_model, d_ff, dropout)
    position = PositionalEncoding(d_model, dropout)

    model = Encoder_model(Encoder(EncoderLayer(d_model, c(attn), c(ff), dropout), N),
                          nn.Sequential(Embeddings(d_model, src_vocab), c(position)),
                          EncoderGenerator(d_model, 1, input_dim, out_dim))
    for p in model.parameters():
        if p.dim() > 1:
            nn.init.xavier_uniform(p)
    return model

model = make_ConvEncoder_model(src_vocab, N, d_model, d_ff, h, vector_in_dim, vector_out_dim-1, dropout)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

Could you help me to solve my problem?

/content/Transformer_forecasting/Module/model.py:113: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
nn.init.xavier_uniform§
Traceback (most recent call last):
File “main.py”, line 78, in
Train()
File “main.py”, line 52, in Train
myloss = run_module.run_epoch(epoch, DataSet.construct_batch(dataloader_train, vector_in_dim, vector_out_dim), model, loss, model_name)
File “/content/Transformer_forecasting/train_module/run_module.py”, line 60, in run_epoch
out = model.forward(batch.src, batch.src_mask)
File “/content/Transformer_forecasting/Module/model.py”, line 64, in forward
out = self.encoder(self.src_embed(src), src_mask)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py”, line 117, in forward
input = module(input)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/content/Transformer_forecasting/Module/Embedding.py”, line 17, in forward
return self.lut(x)*math.sqrt(self.d_model)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py”, line 93, in forward
return F.linear(input, self.weight, self.bias)
File “/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py”, line 1692, in linear
output = input.matmul(weight.t())
RuntimeError: Tensor for ‘out’ is on CPU, Tensor for argument #1 ‘self’ is on CPU, but expected them to be on GPU (while checking arguments for addmm)

ses91 · January 24, 2021, 3:29pm

Thanks that was also my problem I also used a normal list for multiple inputs.

shivani · February 11, 2021, 12:23am

This worked for me I was not assigning tensor

lima · May 21, 2021, 5:01pm

@ptrblck I’m facing a similar error:

model = torch.nn.Sequential(
    torch.nn.Conv1d(1, 1024, kernel_size=7, stride=1, padding=0, dilation=1, groups=1, bias=True),
    torch.nn.BatchNorm1d(1024),
    torch.nn.ReLU(inplace=True),
    torch.nn.Conv1d(1024, 1024, kernel_size=1, stride=1, padding=0, dilation=1, groups=1, bias=True),
    torch.nn.BatchNorm1d(1024),
    torch.nn.ReLU(inplace=True),
    torch.nn.Conv1d(1024, 51, kernel_size=1, stride=1, padding=0, dilation=1, groups=1, bias=True)
    )


classifier = torch.nn.Linear(51 * 7, 51)
model.cuda()

for e in range(num_epochs):
    running_loss = 0
    for batch in train_loader:
        features, labels = batch[:, :-1], batch[:, -1]
        features, labels = features.to(device), labels.to(device)
        features = features.unsqueeze(dim=1)
        outputs  = model(features)
        outputs  = outputs.view(outputs.size(0), -1)
        scores   = classifier(outputs)
        loss = criterion(outputs, labels.long())
        loss.backward()
        optimizer.zero_grad()
        optimizer.step()
        running_loss += loss.item()

Here’s the trace

RuntimeError                              Traceback (most recent call last)
<ipython-input-93-8ef77d1136dc> in <module>
     11         outputs  = model(features)
     12         outputs  = outputs.view(outputs.size(0), -1)
---> 13         scores   = classifier(outputs)
     14         loss = criterion(outputs, labels.long())
     15         loss.backward()

~/miniconda3/envs/mytraining/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

~/miniconda3/envs/mytraining/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     92 
     93     def forward(self, input: Tensor) -> Tensor:
---> 94         return F.linear(input, self.weight, self.bias)
     95 
     96     def extra_repr(self) -> str:

~/miniconda3/envs/mytraining/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1751     if has_torch_function_variadic(input, weight):
   1752         return handle_torch_function(linear, (input, weight), input, weight, bias=bias)
-> 1753     return torch._C._nn.linear(input, weight, bias)
   1754 
   1755 

RuntimeError: Tensor for 'out' is on CPU, Tensor for argument #1 'self' is on CPU, but expected them to be on GPU (while checking arguments for addmm)

ptrblck · May 22, 2021, 7:19am

You haven’t pushed classifier to the GPU only model, so you should add classifier.cuda() to the code.

lima · May 22, 2021, 11:53am

@ptrblck That did the trick. Thank you sir.

jose-solorzano · June 2, 2021, 6:46pm

I ran into a similar issue that wasn’t obvious. I was explicitly setting the weights and the bias parameter of a linear layer. Those weights are tensors, so they also need to be moved to CUDA.