I am working on a project which includes training many neural nets of the same “format” but varying depth. It is thus useful to have a function/class that takes “depth” as an argument–or more generally a variable-length list of layer sizes–and returns a neural net of the requisite depth. For example, in the case of simple feedforward ReLU nets, the following function suffices:
def make_relu_net(layer_sizes):
stack = []
for i, input_size in enumerate(layer_sizes[:-1]):
output_size = layer_sizes[i+1]
stack.append(nn.Linear(input_size, output_size))
if i < len(layer_sizes) - 2:
stack.append(nn.ReLU())
return nn.Sequential(*stack)
However, I am wondering how to extend this to more general neural net layouts that don’t necessarily fit the nn.Sequential
format. It’s my understanding that nn.Module
should be used for more general network construction. Given the above example of ReLU nets for simplicity, my naive attempt would look like this:
class FeedForwardNet(nn.Module):
def __init__(self, layer_sizes):
super().__init__()
self.stack = []
for i, input_size in enumerate(layer_sizes[:-1]):
output_size = layer_sizes[i+1]
self.stack.append(
nn.Linear(input_size, output_size))
def forward(self, x):
for layer in self.stack[:-1]:
x = layer(x).clamp(min=0)
return self.stack[-1](x)
There are a few things that seem problematic to me with the above class:
-
The main issue is that there’s a for-loop in the
forward
method. In all the pytorch code examples of deep nets I’ve seen online, I don’t think I’ve seen any with for-loops inforward
. My assumption is because this would be slow compared to hard-coding the network layout. But my use-case is to allow for variable-depth, so I’m not sure how to get around this. Is it okay to have the for-loop there or is there a better way, e.g., not subclassingnn.Module
? -
Given that all the layers are stored in
self.stack
, it seems thestate_dict
is never populated. Is there a way to generate a properstate_dict
that still allows for a variable number of layers as required in my use-case? -
Smaller issue: is the proper thing to include ReLU layers in
__init__
as attributes of the class (like theLinear
layers) or usingclamp
inforward
as in my code? More generally for any non-weight layer: better to compute on-the-fly inforward
or make an attribute in__init__
?