Pattern for tunable number of layers (functorch friendly if possible)

In the process of hyperparameter tuning I want to be able to try the model with different numbers of layers. Someone asked a similar question in Create a network where the number/type of layers are given as parameter input but it didn’t have an answer.

Is there a canonical way to do this which is the highest performance (and, hopefully, is setup to work well with functorch’s make_functional ?

For example, for a deep sets architecture we have things like

n_in =  3# set for problem
n_out = 2 # set for problem
hidden_dim = 128 # TUNED!
activator = nn.RelU() # TUNED!  Maybe nn.Tanh/etc.
last_activator = nn.Softplus() # TUNED!  Maybe just identify/etc.
layers = 4 # TUNED!

model = nn.Sequential(
    nn.Linear(n_in, hidden_dim),
    self.activator,
    # Add in layers - 1
    *[
        nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim),
            deepcopy(
                activator
            ),  
        )
        for i in range(layers - 1)
    ],
    nn.Linear(hidden_dim, n_out, bias=True),
    last_activator
)

Is there a pattern with the highest performance for this sort of thing with the flexible layers or is the performance identical due to tracing? What about if we try to use make_functional and apply this across batches? Or even try to with future incarnations of functorch’s aot_function ?