Why does nn.Module.cuda() use _apply() instead of apply()?

I’m overriding cuda() in a custom module, and for some reason it’s not getting called from other modules containing my module. Here’s a minimal(ish) example:

import torch
from torch import nn

class Layer(nn.Module):

    def cuda(self, device_id=None):
        print('  CUDA')
        self.use_cuda = True
        super().cuda(device_id)

    def __init__(self):
        super().__init__()

        self.use_cuda = False

class Outer(nn.Module): # Module containing a Layer
    def __init__(self):
        super().__init__()

        self.inner = Layer()

print('sequential model:')

model = nn.Sequential(Layer())
model.cuda() # ! nothing printed

print('module containting Layer:')

model = Outer()
model.cuda() # ! nothing printed

print('calling apply myself:')

model = Outer()
model.apply(lambda t : t.cuda())

print('calling cuda() directly:')

model = Layer()
model.cuda()

This leads to the following output:

sequential model:
module containting Layer:
calling apply myself:
  CUDA
calling cuda() directly:
  CUDA

The difference seems to be that _apply() (which is used under the hood in cuda() doesn’t apply the passed lambda to the module itself, but only to its members.

Is this desired behavior? My use case is that I’m creating some tensors on the fly, whose shape depends on the input. I need to keep track of whether the module has been cuda’d, so I can create the right kind of tensor. Is there a better way to do this?

From the look of the code, cuda() is never called directly on modules. Instead they are called on the parameters of the modules.

So you can try to track the cuda status of the parameters of your layer.