How to new tensors in neural network without knowing device?

silent567 · December 18, 2018, 7:10am

Hi, I want to implement some network modules like Conv2D, which could be utilized in different models and different devices. Similarly to Conv2D, there are also some variables like weights/biases.

However, it seems that when I use torch.Tensor, it usually assumes or requires the device argument, which could not be decided. I have tried the .to function, but it doesn’t work for no reason (no warnings). The variables should be defined in init function instead of in the forward function for each forward pass I think. Therefore, the input.device can not be utilized.

Then, my question is: how to new/construct variables in the subclass of nn.Module that are agnostic to devices?
It could be simple but really confusing. Can anyone help me? Thank you!

Amrit_Das · December 18, 2018, 7:18am

By device do you mean cuda and cpu ??

If yes,
You can use the following line to detect,

torch.cuda.is_available()

This returns a Boolean value depending upon wether or not cuda is available. So you can use if statements to look gor devices and this should make it work in both environments.

For ex,

cuda = torch.cuda.is_available()

tensor = some_values
if cuda:
    tensor.cuda()
input = Variable(tensor)
output = model(input)

silent567 · December 18, 2018, 7:25am

Thank you for such quick reply. The device does mean cuda and cpu.

But

torch.cuda.is_available()

seems like a global configuration detection. There could be situations where different parts of neural networks containing same kinds of modules (say Conv2D) run on different devices which could be controlled in the main function for train and test. In my situation, I would like RNN with custom attention to run on CPU and CNN with same kinds of custom attention to run on GPU in the same network. The tensor’s device could not be decided using some global configuration functions I think.

I am a beginner for pytorch, please correct me if I’m wrong

justusschock · December 18, 2018, 8:14am

Thats correct, you would have to implement the logic by yourself. It would look like this:

class MyModule(torch.nn.Module):
    def __init__(self, *args, **kwargs):
        super().__init__()
        self.module_gpu = torch.nn.Conv2d(3, 3, 3, padding=1)
        self.module_cpu = torch.nn.Conv2d(3, 3, 3, padding=1)

    def push_to_devices(self):
        self.module_gpu = self.module_gpu.cuda()
        self.modlue_cpu = self.module_cpu.cpu()

    def forward(self, x):
        x = x.cuda()
        x = self.module_gpu(x)
        x = x.cpu()
        return self._module_cpu(x)


module = MyModule()
in = torch.rand(1, 3, 64, 64)
print(in.device)
module.push_to_devices()
out = module(in)
print(out.device)

Note: You would have to module.push_to_devices() after every change of the networks devices, since this would also change the devices of these layers. If you don’t have to do it like this, I would recommend not to do it, since this usually leads to errors and your code is not device agnostic (i.e. won’t work without a GPU without code changes)

silent567 · December 18, 2018, 8:27am

Thank you for the reply. Actually, the situation for network distributed in multiple devices servers as a motivation for device-agnostic modules.

My objective is more like implementing custom Conv2D. And Here is a toy example:

class Toy(torch.nn.Module):
	def __init__(self):
		super(Toy,self).__init__()
		self.var = torch.ones([],requires_grad=True)
	def forward(self,x):
		print(self.var.device)
		return self.var*x

toy = Toy()
print(toy.var.device) #cpu 
a = torch.ones([],device='gpu')
print(toy(a).device) #cpu 
toy.to('cuda')
print(toy(a).device) #cpu 
toy.cuda()
print(toy(a).device) #cpu

On the other hand, for example, the Linear module will be pushed to cuda after running .to('cuda')

l = Linear(3,4)
print(l.weight.device) #cpu 
l.to('cuda')
print(l.weight.device) #cuda

Should I overwrite the to function to implement it?

silent567 · December 18, 2018, 8:47am

Yes, overwriting the to function solves my problem. For the previous example, the codes to overwrite are as follows:

class Toy(torch.nn.Module):
	def __init__(self):
		super(Toy,self).__init__()
		self.var = torch.ones([],requires_grad=True)
	def forward(self,x):
		print(self.var.device)
		return self.var*x
	def to(self,device):
		super(Toy,self).to(device)
		self.var = self.var.to(device)

justusschock · December 18, 2018, 8:51am

No, you shouldn’t. You have to register the variable as a Parameter if it requires grad or as a buffer if it does not. I modified your toy example to show both cases:

class Toy(torch.nn.Module):
    def __init__(self):
        super(Toy, self).__init__()
        self.register_parameter("var_param", torch.nn.Parameter(torch.ones([], requires_grad=True)))

        self.register_buffer("var_buffer", torch.ones([], requires_grad=False))


    def forward(self, x):
        print(self.var_param.device)
        return self.var_param * x + self.var_buffer


toy = Toy()
print(toy.var_param.device) 
a = torch.ones([], device='cuda')
print(toy(a).device)  
toy.to('cuda')
print(toy(a).device)   
toy.cuda()
print(toy(a).device)

Note: I changed the line a = torch.ones([], device='gpu') since "gpu" is not a valid device specifier but "cuda" is.

For further information have a a look at the documentation of register_buffer, register_parameter and torch.nn.Parameter

silent567 · December 18, 2018, 8:54am

Thank you very much! It’s a much more elegant solution.