What’s the appropriate way to keep a constant that’s part of a module on the correct device. If I define a constant Variable
for use in a module, calling the module’s cuda()
method will not move the constant to the GPU. However, if I set the constant as a Parameter
that does not require gradient, then turning off and on the gradient for the entire module will flip this switch for that one parameter as well (changing it to no longer be a non-trained constant).
As an example, I the following module which is made to scale the input by a trainable positive value (using a trainable exponent):
class MyModule(Module):
def __init__(self):
super().__init__()
self.exponent = Parameter(torch.Tensor([1]))
self.e = Variable(torch.Tensor([math.e]), requires_grad=False)
def forward(self, x):
x = x * (self.e.pow(self.exponent))
return x
Here e
should remain constant. But if I want to be able to move this module off and on the GPU without trouble, what’s the best way to handle this? I know I could also override the cuda()
method of the module, but this still requires having a special case for any constant I use. Is there a better way? Thank you!
Edit: Actually, I’ve found that overriding cuda()
for the module does not work well, because if this module is include as part of another module, the larger module does not call the submodule’s cuda
method, and so does not trigger the override method.