Model parameters not shown in model.parameters() generator unless modules initialized in constructor

Is there a way to define a model without initializing every module in the constructor?
It seems that this way model.parameters() is an empty list and would not work for the optimizer.
Is there a way for this to work dynamically calling modules in the forward method?

class TransformerEncoder(nn.Module):

    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 32, bias=False)

    def forward(self, x):
        # x = nn.Linear(x.shape[1], 32, bias=False)(x)
        x = self.linear(x)
        x = nn.Dropout(0.1)(x)
        x = nn.Linear(32, 64, bias=False)(x)
        x = nn.Dropout(0.01)(x)
        x = nn.Linear(64, 128)(x)
        x = nn.ReLU()(x)
        x = nn.Linear(128, 32)(x)
        x = nn.Dropout(0.3)(x)

        return x
model = TransformerEncoder()
list(model.parameters())
[Parameter containing:
 tensor([[-0.4792,  0.5095,  0.0218],
         [-0.4992, -0.0950, -0.1015],
         [-0.4775,  0.4182,  0.3103],
         [-0.5134, -0.3572, -0.4396],
         [ 0.0881, -0.0877, -0.2812],
         [ 0.3770,  0.4974,  0.4990],
         [ 0.0478, -0.3706, -0.1895],
         [-0.3901, -0.3802,  0.3413],
         [ 0.4509, -0.0291, -0.1843],
         [ 0.3662,  0.1571, -0.3094],
         [ 0.3420,  0.2866,  0.2167],
         [ 0.5187,  0.2042, -0.5334],
         [-0.0020,  0.2633,  0.4540],
         [ 0.5211,  0.4651,  0.5746],
         [ 0.3727,  0.5121,  0.1946],
         [ 0.4460, -0.0212, -0.2560],
         [ 0.1041,  0.1689, -0.2890],
         [-0.1101, -0.0870,  0.0905],
         [ 0.1388,  0.3151, -0.3811],
         [ 0.3124, -0.4644,  0.2603],
         [ 0.4458, -0.1589,  0.4804],
         [-0.2198, -0.3407,  0.1858],
         [ 0.3404, -0.1001, -0.1115],
         [-0.5361,  0.5403, -0.1926],
         [-0.1439, -0.2009,  0.2738],
         [-0.3845, -0.5399,  0.3298],
         [-0.1042,  0.0114,  0.2496],
         [ 0.3915, -0.0160, -0.2393],
         [ 0.0127,  0.1600,  0.1321],
         [-0.2192,  0.1396,  0.5114],
         [-0.4857,  0.2361, -0.5637],
         [-0.1308, -0.1101, -0.2792]], requires_grad=True)]

This is expected behavior since you are not registering the modules.
The newly created layers won’t be trained and will use randomly initialized parameters.
You could assign these layers to self in the forward and add a condition which would call the already initialized modules in future iterations, but you would also need to perform a forward pass before passing the parameters to the optimizer. I don’t fully understand your use case here. If you don’t want to device the features, use nn.Lazy* modules.

Shouldn’t the modules in the forward be registered automatically? Is there any specific reason why we have to register them in the constructor?
This way we have to write twice the same thing, first create a module instance in the constructor and then call that object instance in the forward.

The naive question is why not create and call the object at the same time nn.Linear(fan_in, fan_out)(x)?

Or, something like a context manager?

    def forward(self, x):
        with torch.registered_modules():
            x = nn.Linear(x.shape[1], 32, bias=False)(x)
            x = nn.Linear(32, 64, bias=False)(x)
        x = nn.Dropout(0.01)(x)
        x = nn.Linear(64, 128)(x)
        x = nn.ReLU()(x)
        x = nn.Linear(128, 32)(x)
        x = nn.Dropout(0.3)(x)

        return x

Then, the model.parameters() will have of 2 linear modules

You’re insinuating that there’s a way to achieve this but didn’t quite get it, could you elaborate with a MWE?

Another aspect where what I’m proposing might be interesting is the following scenario.
You want a model that works with any input but in order to construct that you’d need to dynamically create/construct the model based on the input data dimensions. But, what if you don’t know a priori the shape of your data? For instance, when we register modules in the constructor you have to know a priori the shape of your data to decide the fan_in, fan_out variables, but, in the forward you can directly get that from whatever shape is the input x we pass. So, having modules being registered in the forward might make sense?

class SomeModel(nn.Module):

def __init__(self, config):
        super().__init__()
        self._config = config
        self.linear1 = nn.Linear(fan_in, fan_out) # Here we need to know a priori the fan_in dim

    def forward(self, x):
        self.linear1 = nn.Linear(x.shape[-1], fan_out) # Instead, here we directly get it from the data

I guess LazyLinear is one solution to infering input shapes. Wasn’t aware of it, thanks for bringing it up. It’s a nice addition hope it becomes more mainstream and the warning about being a developmental feature slowly fades away in the future.

No and I wouldn’t know how it should work. E.g. which attribute names should be used?
You can explicitly register modules in the forward as already explained, but you are not doing it either.

Simplicity, as you can finish the setup (including the optimizer) before executing the model.

Splitting code into an initialization and execution creates a clean and understandable code. It e.g. allows you to reuse modules easily and is the standard approach in PyTorch.

Already explained:

Can you point me to a reference or example of that. Where should I look?