Why model.to(device) wouldn't put tensors on a custom layer to the same device?

Currently, I have to pass a device parameter into my custom layer and then manually put tensors onto the specified device manually using .to(device) or device=device.

Is this behavior expected? It looks kind of ugly to me.

Shouldn’t model.to(device) put all the layers, including my custom layer, to device for me?

1 Like

Could you post a small code snippet reproducing this error?
I tried to reproduce your error, but it seems to work fine:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.fc1 = nn.Linear(1, 1)
        
    def forward(self, x):
        x = self.fc1(x)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 1)
        self.module1 = MyModule()
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.module1(x)
        return x   

model = Net()
model = model.to('cuda:0')
print(model.module1.fc1.weight.type())
> torch.cuda.FloatTensor
print(model.fc1.weight.type())
> torch.cuda.FloatTensor
4 Likes
import torch.nn as nn
import torch

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.fc1 = nn.Linear(1, 1)

    def forward(self, x):
        n = torch.range(0,5)
        print(n.type())
        #x = x.mm(n)
        x = self.fc1(x)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 1)
        self.module1 = MyModule()

    def forward(self, x):
        x = self.fc1(x)
        x = self.module1(x)
        return x

model = Net()
model = model.to('cuda:0')
model(torch.zeros(1, 10).to('cuda:0'))
print(model.module1.fc1.weight.type())
print(model.fc1.weight.type())

Output:

torch.FloatTensor
torch.cuda.FloatTensor
torch.cuda.FloatTensor

The new tensor I define in my custom layer is not on cuda. As a result, I need to pass in device so that I can move them to device.

I would expect the default to be that new tensor should also be on cuda once your model is moved to cuda.

Ok this is not a good example because I could just move it to the same tensor as x.

But the problem I face is because I have to define the weights myself

self.w = Parameter(torch.zeros(out_features, in_features))

in __init__ function.

At that point, I still do not know the input device. Therefore my self.w is on the cpu but my input is on the gpu.

parameters will be moved as well in .to so you don’t have to worry about that parameter you add.

Ok I found out the problem. It is not with weight that has been declared as a Parameter. It is with a map that I define in __init__ function. I need this map to also be put into the same cuda device as weight but I do not want it to be a Parameter.

I tried having these lines

self.w = Parameter(torch.zeros(out_features, in_features))
self.k_map = torch.zeros(self.m+1, device=self.w.device)

Apparently at initialization, the weight is still on the cpu, which causes the map to be on the cpu.

How should I map them both on cuda but not making k_map a Parameter?

4 Likes

You could derive the .cuda() and/or the .to() function for something like this:

class Net(torch.nn.Module):
    def __init__(self):
        self.a = torch.nn.Parameter(torch.zeros(1))
        self.b = torch.zeros(1)

    def forward(self, inputs):
        #do some stuff
    
    def cuda(self, device=None):
        self = super().cuda(device)
        self.b = self.b.cuda(device)
        return self 

    def to(self, *args, **kwargs):
        self = super().to(*args, **kwargs) 
        self.b = self.b.to(*args, **kwargs) 
        return self

EDIT: I recommend only to derive the .to() function because you have to make sure that you also derive the .cpu() function when deriving .cuda()

1 Like

In that case you should use register_buffer.

7 Likes

Thanks a lot! register_buffer is indeed what I am looking for.

Sorry if new to this, how does register_buffer handle the .to(device) problem here, such that the custom module goes on the same device as the model it is part of. Thank you so much!

The .to() method will be applied on all internal _parameters and _buffers as can be seen here and here.

The _parameters and _buffers attributes will be registered, if you set an nn.Parameter as an attribute or use self.register_buffer. Have a look at this example:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.my_param = nn.Parameter(torch.randn(1))
        self.register_buffer('my_buffer', torch.randn(1))
        self.neither_param_nor_buf = torch.randn(1)
        
        
model = MyModel()
print(dict(model.named_parameters()))
> {'my_param': Parameter containing:
tensor([-1.1189], requires_grad=True)}
print(dict(model.named_buffers()))
> {'my_buffer': tensor([-0.0459])}

model.to('cuda')
print(dict(model.named_parameters()))
> {'my_param': Parameter containing:
tensor([-1.1189], device='cuda:0', requires_grad=True)}
print(dict(model.named_buffers()))
> {'my_buffer': tensor([-0.0459], device='cuda:0')}
print(model.neither_param_nor_buf.device)
> cpu

As you can see, only the parameters and buffers were moved, while the tensor is still on the CPU.

8 Likes

It works for me. Thank you very much. :smiley:

Is there any way to make sure that tensors created in the forward method are also on the appropriate device without passing in the device explicitly? Do you need to register buffers in the forward pass in this case? Seems kinda weird…maybe you could say something like myForwardTensor.to(self.device)?

You could use the device attribute of the input tensor or of any parameter (next(self.parameters()).device).

solved my problem by seeing this! Thanks!

One thing to be careful with here - make sure that the you don’t assign the tensor to a variable, and then register the variable as a buffer, like this:

#  Incorrect way
self.my_tensor = torch.randn(1)
self.register_buffer('my_buffer', my_tensor)

If you do this, then try to access the original variable instead of the buffer, you will find that the device has not propagated to the variable.

def forward(self, inputs):
    inputs = inputs / self.my_tensor # ERROR: self.my_tensor is on cpu

This is the right way to do it:

def __init__(self):
    self.register_buffer('my_buffer', torch.randn(1))
...
def forward(self, inputs):
    inputs = inputs / self.my_buffer # This works. self.my_buffer is on gpu
1 Like

That’s a good point and thanks for sharing.
Since you’ve tagged me I assume one of my code snippets shows this behavior? (I can’t find it here, so could you send me a link to it so that I could add a comment or correct it?)

Nope, your code is correct. I was following your code and ran into this problem because I implemented register_buffer slightly differently than you did. So I thought I’d share to help others avoid my mistake :slight_smile:

1 Like