I’m overriding cuda() in a custom module, and for some reason it’s not getting called from other modules containing my module. Here’s a minimal(ish) example:
import torch from torch import nn class Layer(nn.Module): def cuda(self, device_id=None): print(' CUDA') self.use_cuda = True super().cuda(device_id) def __init__(self): super().__init__() self.use_cuda = False class Outer(nn.Module): # Module containing a Layer def __init__(self): super().__init__() self.inner = Layer() print('sequential model:') model = nn.Sequential(Layer()) model.cuda() # ! nothing printed print('module containting Layer:') model = Outer() model.cuda() # ! nothing printed print('calling apply myself:') model = Outer() model.apply(lambda t : t.cuda()) print('calling cuda() directly:') model = Layer() model.cuda()
This leads to the following output:
sequential model: module containting Layer: calling apply myself: CUDA calling cuda() directly: CUDA
The difference seems to be that _apply() (which is used under the hood in cuda() doesn’t apply the passed lambda to the module itself, but only to its members.
Is this desired behavior? My use case is that I’m creating some tensors on the fly, whose shape depends on the input. I need to keep track of whether the module has been cuda’d, so I can create the right kind of tensor. Is there a better way to do this?