In the process of hyperparameter tuning I want to be able to try the model with different numbers of layers. Someone asked a similar question in Create a network where the number/type of layers are given as parameter input but it didn’t have an answer.
Is there a canonical way to do this which is the highest performance (and, hopefully, is setup to work well with functorch’s make_functional
?
For example, for a deep sets architecture we have things like
n_in = 3# set for problem
n_out = 2 # set for problem
hidden_dim = 128 # TUNED!
activator = nn.RelU() # TUNED! Maybe nn.Tanh/etc.
last_activator = nn.Softplus() # TUNED! Maybe just identify/etc.
layers = 4 # TUNED!
model = nn.Sequential(
nn.Linear(n_in, hidden_dim),
self.activator,
# Add in layers - 1
*[
nn.Sequential(
nn.Linear(hidden_dim, hidden_dim),
deepcopy(
activator
),
)
for i in range(layers - 1)
],
nn.Linear(hidden_dim, n_out, bias=True),
last_activator
)
Is there a pattern with the highest performance for this sort of thing with the flexible layers or is the performance identical due to tracing? What about if we try to use make_functional
and apply this across batches? Or even try to with future incarnations of functorch’s aot_function
?