Pattern for tunable number of layers (functorch friendly if possible)

In the process of hyperparameter tuning I want to be able to try the model with different numbers of layers. Someone asked a similar question in Create a network where the number/type of layers are given as parameter input but it didn’t have an answer.

Is there a canonical way to do this which is the highest performance (and, hopefully, is setup to work well with functorch’s make_functional ?

For example, for a deep sets architecture we have things like

n_in =  3# set for problem
n_out = 2 # set for problem
hidden_dim = 128 # TUNED!
activator = nn.RelU() # TUNED!  Maybe nn.Tanh/etc.
last_activator = nn.Softplus() # TUNED!  Maybe just identify/etc.
layers = 4 # TUNED!

model = nn.Sequential(
    nn.Linear(n_in, hidden_dim),
    # Add in layers - 1
            nn.Linear(hidden_dim, hidden_dim),
        for i in range(layers - 1)
    nn.Linear(hidden_dim, n_out, bias=True),

Is there a pattern with the highest performance for this sort of thing with the flexible layers or is the performance identical due to tracing? What about if we try to use make_functional and apply this across batches? Or even try to with future incarnations of functorch’s aot_function ?