Keeping constant value in module on correct device?

What’s the appropriate way to keep a constant that’s part of a module on the correct device. If I define a constant Variable for use in a module, calling the module’s cuda() method will not move the constant to the GPU. However, if I set the constant as a Parameter that does not require gradient, then turning off and on the gradient for the entire module will flip this switch for that one parameter as well (changing it to no longer be a non-trained constant).

As an example, I the following module which is made to scale the input by a trainable positive value (using a trainable exponent):

class MyModule(Module):
    def __init__(self):
        super().__init__()
        self.exponent = Parameter(torch.Tensor([1]))
        self.e = Variable(torch.Tensor([math.e]), requires_grad=False)

    def forward(self, x):
        x = x * (self.e.pow(self.exponent))
        return x

Here e should remain constant. But if I want to be able to move this module off and on the GPU without trouble, what’s the best way to handle this? I know I could also override the cuda() method of the module, but this still requires having a special case for any constant I use. Is there a better way? Thank you!

Edit: Actually, I’ve found that overriding cuda() for the module does not work well, because if this module is include as part of another module, the larger module does not call the submodule’s cuda method, and so does not trigger the override method.

You could use register_buffer for automatic handling of your constant.

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule, self).__init__()
        self.exponent = nn.Parameter(torch.Tensor([1]))
        self.e = torch.Tensor([np.e]).float()
        self.register_buffer('e_const', self.e)

    def forward(self, x):
        x = x * (Variable(self.e_const).pow(self.exponent))
        return x
3 Likes

Since Variable API is now deprecated, how to set requires_grad=True on a buffer or is it unnecessary?

A buffer shouldn’t require gradients. Instead use nn.Parameter if your tensor should require gradients (by default requires_grad will be True). Generally, you can now pass the requires_grad argument to torch.tensor(1, requires_grad=True).

This way, we have two names for the same thing e and e_const. Is this still the recommended way?

You don’t have to assign self.e, if it’s never used and can just register the buffer once.