Function/Class for Generating Variable-Depth Networks

I am working on a project which includes training many neural nets of the same “format” but varying depth. It is thus useful to have a function/class that takes “depth” as an argument–or more generally a variable-length list of layer sizes–and returns a neural net of the requisite depth. For example, in the case of simple feedforward ReLU nets, the following function suffices:

def make_relu_net(layer_sizes):
    stack = []
    for i, input_size in enumerate(layer_sizes[:-1]):
        output_size = layer_sizes[i+1]
        stack.append(nn.Linear(input_size, output_size))
        if i < len(layer_sizes) - 2:
            stack.append(nn.ReLU())
    return nn.Sequential(*stack)

However, I am wondering how to extend this to more general neural net layouts that don’t necessarily fit the nn.Sequential format. It’s my understanding that nn.Module should be used for more general network construction. Given the above example of ReLU nets for simplicity, my naive attempt would look like this:

class FeedForwardNet(nn.Module):
    def __init__(self, layer_sizes):
        super().__init__()
        self.stack = []
        for i, input_size in enumerate(layer_sizes[:-1]):
            output_size = layer_sizes[i+1]
            self.stack.append(
                nn.Linear(input_size, output_size))

    def forward(self, x):
        for layer in self.stack[:-1]:
            x = layer(x).clamp(min=0)
        return self.stack[-1](x)

There are a few things that seem problematic to me with the above class:

  1. The main issue is that there’s a for-loop in the forward method. In all the pytorch code examples of deep nets I’ve seen online, I don’t think I’ve seen any with for-loops in forward. My assumption is because this would be slow compared to hard-coding the network layout. But my use-case is to allow for variable-depth, so I’m not sure how to get around this. Is it okay to have the for-loop there or is there a better way, e.g., not subclassing nn.Module?

  2. Given that all the layers are stored in self.stack, it seems the state_dict is never populated. Is there a way to generate a proper state_dict that still allows for a variable number of layers as required in my use-case?

  3. Smaller issue: is the proper thing to include ReLU layers in __init__ as attributes of the class (like the Linear layers) or using clamp in forward as in my code? More generally for any non-weight layer: better to compute on-the-fly in forward or make an attribute in __init__?

  1. The for loop is not a problem and you shouldn’t see any Python overhead, as the main workload is in the layer execution, not the for loop evaluation in Python. nn.ModuleList uses a for loop as the example use case.

  2. If you store the modules in a Python list, they won’t be registered properly, and model.parameters() won’t return all parameters. Use nn.ModuleList instead.

  3. It depends on your coding style. Stateless modules, e.g. ReLU can be used in their functional API (F.relu) without the necessity to store some parameters manually. If you register these modules in your __init__, you could e.g. swap them with another activation function without manipulating the forward method, but I would still say it’s more a coding style question.