I’m overriding cuda() in a custom module, and for some reason it’s not getting called from other modules containing my module. Here’s a minimal(ish) example:
import torch
from torch import nn
class Layer(nn.Module):
def cuda(self, device_id=None):
print(' CUDA')
self.use_cuda = True
super().cuda(device_id)
def __init__(self):
super().__init__()
self.use_cuda = False
class Outer(nn.Module): # Module containing a Layer
def __init__(self):
super().__init__()
self.inner = Layer()
print('sequential model:')
model = nn.Sequential(Layer())
model.cuda() # ! nothing printed
print('module containting Layer:')
model = Outer()
model.cuda() # ! nothing printed
print('calling apply myself:')
model = Outer()
model.apply(lambda t : t.cuda())
print('calling cuda() directly:')
model = Layer()
model.cuda()
This leads to the following output:
sequential model:
module containting Layer:
calling apply myself:
CUDA
calling cuda() directly:
CUDA
The difference seems to be that _apply() (which is used under the hood in cuda() doesn’t apply the passed lambda to the module itself, but only to its members.
Is this desired behavior? My use case is that I’m creating some tensors on the fly, whose shape depends on the input. I need to keep track of whether the module has been cuda’d, so I can create the right kind of tensor. Is there a better way to do this?