nn.Parameter() and register_buffer moving to GPU

In this simplified example, I want alpha to be learnable and pe to be constant. Below is my model:

import torch
import torch.nn as nn

class Model(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.alpha = nn.Parameter(torch.ones([10]))

        pe = torch.arange(10)
        self.register_buffer("pe", pe)
            
    def forward(self, x):
        x = x + self.alpha * self.pe
        return x

model = Model()

The below operation on cpu works fine:

ip = torch.rand(10)
model(ip)

When I run this on GPU, I get the following error:

ip.to('cuda')
model.to('cuda')
model(ip)
/tmp/ipykernel_1284080/2457928180.py in forward(self, x)
     10 
     11     def forward(self, x):
---> 12         x = x + self.alpha * self.pe
     13         return x

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

What am I doing wrong here?

You need to reassign the tensor if you move it to another device:

ip = ip.to("cuda")
1 Like

For anyone that stumbles on a similar question in future: I was sleepy while writing the question and the original problem I was working was a bit complicated so I wasn’t able to simplify that to a simple example as expected.

What I wanted was to make both alpha and pe, learnable. And this was giving me the above error:

class Model(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.alpha = nn.Parameter(torch.ones([10]))

        self.pe = torch.arange(10)
   
    def forward(self, x):
        x = x + self.alpha * self.pe
        return x

The issue in this model is that self.pe is not defined as a learnable tensor.

The fix is:

self.pe = nn.Parameter(torch.arange(10).to(torch.float32))