Caching constant torch.eye at a nn.Module and moving to(device)

knownad · November 1, 2019, 2:18pm

I use a torch.eye(.) during the computation of a function. As it doesn’t change shape, so I decided to precompute it within my class once at the constructor instead of generating it on-the-fly during the forward call. Dummy example:

import torch
import torch.nn as nn

class Foo(nn.Module):
    def __init__(self, device='cuda'):
        super(Foo, self).__init__()
        self.weights = nn.Parameter(torch.Tensor(4, 4))
        self.eye = torch.eye(4, device=device, requires_grad=False)
        torch.nn.init.uniform_(self.weights)

    def forward(self):
        return self.weights - self.eye

I have only added the weights as nn.Parameter so it won’t return self.eye when I call model.parameters() and added requires_grad=False to make sure.

Should I be worried of any unintended behavior this could have in my computational graph?

This solution in contrast to this:

class Foo(nn.Module):
    def __init__(self, device='cuda'):
        super(Foo, self).__init__()
        self.weights = nn.Parameter(torch.Tensor(4, 4))
        self.device = device
        torch.nn.init.uniform_(self.weights)

    def forward(self):
        return self.weights - torch.eye(4, device=self.device)

Another question, when moving my model to a device, e.g.,

model = Foo()
model.to('cuda')

it correctly moves self.weights to the GPU, but not self.eye (PyTorch version 1.1.0). So I had to add a device parameter manually to ensure it was in the right device. Is it due to not being a nn.Parameter?

albanD · November 1, 2019, 3:06pm

Hi,

This is the right way to do it. We call such Tensors buffers.
It won’t have any impact on the computational graph as it does not require gradients.

It should be moved along with the parameters when you do model.to('cuda') though in your first code sample.

knownad · November 1, 2019, 6:33pm

Thank you, @albanD!

About moving to GPU, I accidentally left the device argument. After removing it and trying to move to(device) again, it didn’t work in Pytorch 1.1.0:

import torch
import torch.nn as nn

class Foo(nn.Module):
    def __init__(self):
        super(Foo, self).__init__()
        self.weights = nn.Parameter(torch.Tensor(4, 4))
        self.eye = torch.eye(4, requires_grad=False)
        torch.nn.init.uniform_(self.weights)

    def forward(self):
        return self.weights - self.eye

bar = Foo()
bar = bar.to('cuda')
print(bar.weights)  # device='cuda'
print(bar.eye.device)  # 'cpu'
bar()  # error

I also tried in Pytorch 1.3.0, and it also doesn’t move self.eye to GPU, only self.weights.

I’m currently keeping the device argument, just in case.

albanD · November 1, 2019, 9:02pm

Ho my bad actually, you have to call it explicitly:
self.register_buffer('eye', torch.eye(4, requires_grad=False)) (instead of self.eye = xxx).