Why model.to(device) wouldn't put tensors on a custom layer to the same device?

yxchng · May 12, 2018, 3:02am

Currently, I have to pass a device parameter into my custom layer and then manually put tensors onto the specified device manually using .to(device) or device=device.

Is this behavior expected? It looks kind of ugly to me.

Shouldn’t model.to(device) put all the layers, including my custom layer, to device for me?

ptrblck · May 12, 2018, 12:55pm

Could you post a small code snippet reproducing this error?
I tried to reproduce your error, but it seems to work fine:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.fc1 = nn.Linear(1, 1)
        
    def forward(self, x):
        x = self.fc1(x)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 1)
        self.module1 = MyModule()
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.module1(x)
        return x   

model = Net()
model = model.to('cuda:0')
print(model.module1.fc1.weight.type())
> torch.cuda.FloatTensor
print(model.fc1.weight.type())
> torch.cuda.FloatTensor

yxchng · May 13, 2018, 5:16am

import torch.nn as nn
import torch

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.fc1 = nn.Linear(1, 1)

    def forward(self, x):
        n = torch.range(0,5)
        print(n.type())
        #x = x.mm(n)
        x = self.fc1(x)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 1)
        self.module1 = MyModule()

    def forward(self, x):
        x = self.fc1(x)
        x = self.module1(x)
        return x

model = Net()
model = model.to('cuda:0')
model(torch.zeros(1, 10).to('cuda:0'))
print(model.module1.fc1.weight.type())
print(model.fc1.weight.type())

Output:

torch.FloatTensor
torch.cuda.FloatTensor
torch.cuda.FloatTensor

The new tensor I define in my custom layer is not on cuda. As a result, I need to pass in device so that I can move them to device.

I would expect the default to be that new tensor should also be on cuda once your model is moved to cuda.

yxchng · May 13, 2018, 5:33am

Ok this is not a good example because I could just move it to the same tensor as x.

But the problem I face is because I have to define the weights myself

self.w = Parameter(torch.zeros(out_features, in_features))

in __init__ function.

At that point, I still do not know the input device. Therefore my self.w is on the cpu but my input is on the gpu.

SimonW · May 13, 2018, 6:54am

parameters will be moved as well in .to so you don’t have to worry about that parameter you add.

yxchng · May 13, 2018, 8:38am

Ok I found out the problem. It is not with weight that has been declared as a Parameter. It is with a map that I define in __init__ function. I need this map to also be put into the same cuda device as weight but I do not want it to be a Parameter.

I tried having these lines

self.w = Parameter(torch.zeros(out_features, in_features))
self.k_map = torch.zeros(self.m+1, device=self.w.device)

Apparently at initialization, the weight is still on the cpu, which causes the map to be on the cpu.

How should I map them both on cuda but not making k_map a Parameter?

justusschock · May 13, 2018, 8:52am

You could derive the .cuda() and/or the .to() function for something like this:

class Net(torch.nn.Module):
    def __init__(self):
        self.a = torch.nn.Parameter(torch.zeros(1))
        self.b = torch.zeros(1)

    def forward(self, inputs):
        #do some stuff
    
    def cuda(self, device=None):
        self = super().cuda(device)
        self.b = self.b.cuda(device)
        return self 

    def to(self, *args, **kwargs):
        self = super().to(*args, **kwargs) 
        self.b = self.b.to(*args, **kwargs) 
        return self

EDIT: I recommend only to derive the .to() function because you have to make sure that you also derive the .cpu() function when deriving .cuda()

ptrblck · May 13, 2018, 9:00am

In that case you should use register_buffer.

yxchng · May 13, 2018, 9:25am

Thanks a lot! register_buffer is indeed what I am looking for.

sudocoder · August 16, 2019, 1:38am

Sorry if new to this, how does register_buffer handle the .to(device) problem here, such that the custom module goes on the same device as the model it is part of. Thank you so much!

ptrblck · August 16, 2019, 1:20pm

The .to() method will be applied on all internal _parameters and _buffers as can be seen here and here.

The _parameters and _buffers attributes will be registered, if you set an nn.Parameter as an attribute or use self.register_buffer. Have a look at this example:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.my_param = nn.Parameter(torch.randn(1))
        self.register_buffer('my_buffer', torch.randn(1))
        self.neither_param_nor_buf = torch.randn(1)
        
        
model = MyModel()
print(dict(model.named_parameters()))
> {'my_param': Parameter containing:
tensor([-1.1189], requires_grad=True)}
print(dict(model.named_buffers()))
> {'my_buffer': tensor([-0.0459])}

model.to('cuda')
print(dict(model.named_parameters()))
> {'my_param': Parameter containing:
tensor([-1.1189], device='cuda:0', requires_grad=True)}
print(dict(model.named_buffers()))
> {'my_buffer': tensor([-0.0459], device='cuda:0')}
print(model.neither_param_nor_buf.device)
> cpu

As you can see, only the parameters and buffers were moved, while the tensor is still on the CPU.

Qing_En · March 26, 2020, 6:03am

It works for me. Thank you very much.

jackeown · September 28, 2020, 6:00pm

Is there any way to make sure that tensors created in the forward method are also on the appropriate device without passing in the device explicitly? Do you need to register buffers in the forward pass in this case? Seems kinda weird…maybe you could say something like myForwardTensor.to(self.device)?

ptrblck · September 29, 2020, 8:26am

You could use the device attribute of the input tensor or of any parameter (next(self.parameters()).device).

emmb · November 3, 2021, 12:44am

solved my problem by seeing this! Thanks!

jastern33 · April 20, 2022, 4:07am

One thing to be careful with here - make sure that the you don’t assign the tensor to a variable, and then register the variable as a buffer, like this:

#  Incorrect way
self.my_tensor = torch.randn(1)
self.register_buffer('my_buffer', my_tensor)

If you do this, then try to access the original variable instead of the buffer, you will find that the device has not propagated to the variable.

def forward(self, inputs):
    inputs = inputs / self.my_tensor # ERROR: self.my_tensor is on cpu

This is the right way to do it:

def __init__(self):
    self.register_buffer('my_buffer', torch.randn(1))
...
def forward(self, inputs):
    inputs = inputs / self.my_buffer # This works. self.my_buffer is on gpu

ptrblck · April 20, 2022, 5:01am

That’s a good point and thanks for sharing.
Since you’ve tagged me I assume one of my code snippets shows this behavior? (I can’t find it here, so could you send me a link to it so that I could add a comment or correct it?)

jastern33 · April 20, 2022, 12:59pm

Nope, your code is correct. I was following your code and ran into this problem because I implemented register_buffer slightly differently than you did. So I thought I’d share to help others avoid my mistake