Why model.to(device) wouldn't put tensors on a custom layer to the same device?

Currently, I have to pass a device parameter into my custom layer and then manually put tensors onto the specified device manually using .to(device) or device=device.

Is this behavior expected? It looks kind of ugly to me.

Shouldn’t model.to(device) put all the layers, including my custom layer, to device for me?

Could you post a small code snippet reproducing this error?
I tried to reproduce your error, but it seems to work fine:

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.fc1 = nn.Linear(1, 1)
        
    def forward(self, x):
        x = self.fc1(x)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 1)
        self.module1 = MyModule()
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.module1(x)
        return x   

model = Net()
model = model.to('cuda:0')
print(model.module1.fc1.weight.type())
> torch.cuda.FloatTensor
print(model.fc1.weight.type())
> torch.cuda.FloatTensor
2 Likes
import torch.nn as nn
import torch

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.fc1 = nn.Linear(1, 1)

    def forward(self, x):
        n = torch.range(0,5)
        print(n.type())
        #x = x.mm(n)
        x = self.fc1(x)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 1)
        self.module1 = MyModule()

    def forward(self, x):
        x = self.fc1(x)
        x = self.module1(x)
        return x

model = Net()
model = model.to('cuda:0')
model(torch.zeros(1, 10).to('cuda:0'))
print(model.module1.fc1.weight.type())
print(model.fc1.weight.type())

Output:

torch.FloatTensor
torch.cuda.FloatTensor
torch.cuda.FloatTensor

The new tensor I define in my custom layer is not on cuda. As a result, I need to pass in device so that I can move them to device.

I would expect the default to be that new tensor should also be on cuda once your model is moved to cuda.

Ok this is not a good example because I could just move it to the same tensor as x.

But the problem I face is because I have to define the weights myself

self.w = Parameter(torch.zeros(out_features, in_features))

in __init__ function.

At that point, I still do not know the input device. Therefore my self.w is on the cpu but my input is on the gpu.

parameters will be moved as well in .to so you don’t have to worry about that parameter you add.

Ok I found out the problem. It is not with weight that has been declared as a Parameter. It is with a map that I define in __init__ function. I need this map to also be put into the same cuda device as weight but I do not want it to be a Parameter.

I tried having these lines

self.w = Parameter(torch.zeros(out_features, in_features))
self.k_map = torch.zeros(self.m+1, device=self.w.device)

Apparently at initialization, the weight is still on the cpu, which causes the map to be on the cpu.

How should I map them both on cuda but not making k_map a Parameter?

You could derive the .cuda() and/or the .to() function for something like this:

class Net(torch.nn.Module):
    def __init__(self):
        self.a = torch.nn.Parameter(torch.zeros(1))
        self.b = torch.zeros(1)

    def forward(self, inputs):
        #do some stuff
    
    def cuda(self, device=None):
        self = super().cuda(device)
        self.b = self.b.cuda(device)
        return self 

    def to(self, *args, **kwargs):
        self = super().to(*args, **kwargs) 
        self.b = self.b.to(*args, **kwargs) 
        return self

EDIT: I recommend only to derive the .to() function because you have to make sure that you also derive the .cpu() function when deriving .cuda()

In that case you should use register_buffer.

4 Likes

Thanks a lot! register_buffer is indeed what I am looking for.

Sorry if new to this, how does register_buffer handle the .to(device) problem here, such that the custom module goes on the same device as the model it is part of. Thank you so much!

The .to() method will be applied on all internal _parameters and _buffers as can be seen here and here.

The _parameters and _buffers attributes will be registered, if you set an nn.Parameter as an attribute or use self.register_buffer. Have a look at this example:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.my_param = nn.Parameter(torch.randn(1))
        self.register_buffer('my_buffer', torch.randn(1))
        self.neither_param_nor_buf = torch.randn(1)
        
        
model = MyModel()
print(dict(model.named_parameters()))
> {'my_param': Parameter containing:
tensor([-1.1189], requires_grad=True)}
print(dict(model.named_buffers()))
> {'my_buffer': tensor([-0.0459])}

model.to('cuda')
print(dict(model.named_parameters()))
> {'my_param': Parameter containing:
tensor([-1.1189], device='cuda:0', requires_grad=True)}
print(dict(model.named_buffers()))
> {'my_buffer': tensor([-0.0459], device='cuda:0')}
print(model.neither_param_nor_buf.device)
> cpu

As you can see, only the parameters and buffers were moved, while the tensor is still on the CPU.

3 Likes

It works for me. Thank you very much. :smiley: