Is there a way to prepend a module to the beginning or a ModuleList?

I am trying to build a progressive autoencoder, for the encoder part I would like to prepend more layer on top of a ModuleList, how can I achieve that? I want to avoid copying and remaking my model every time I grow a layer.

eg:

# BEFORE GROWTH
self.layers = nn.ModuleList(  [ conv2d( 256, 512 ), nn.ReLu() ]  )

forward(self, x ):

    for layer in self.layers.children():
        x = layer( x )

 return x

** PREPEND module to the top of the encoder... **

# AFTER GROWTH
self.layers = nn.ModuleList(  [ conv2d( 128, 256 ), nn.ReLu(), conv2d( 256, 512 ), nn.ReLu() ]  )

forward(self, x ):

    for layer in self.layers.children():
        x = layer( x )

 return x

I have a working prototype model already but I was constantly destroying and making new nn.sequential which isn’t efficient at all.

Would that work:

mlist = nn.ModuleList([nn.Conv2d(3, 6, 3, 1, 1)])
mlist = nn.ModuleList([nn.Conv2d(1, 3, 3, 1, 1), *mlist])

Or are you trying to avoid exactly this?

Hi prtblck I think there was some mistake in my original code, I didn’t do it your way, I actually made another variable to store both old and new values first then I unpacked everything into a nn.Sequential() like what you did. I can see how that is unnecessary and takes up extra memory. Btw the code above does it make a shadow copy of the original list? Sorry if this sounds like a beginner python question.

It should just pass the nn.Module references around, i.e. no copy should be involved.

Follow up question, do I have to make a new parameter list to feed it into my optimizer every time I increase my network’s complexity? Does Pytorch keep a track of that?

Eg:

# Before growth

Encoder = [ conv2d, relu ]
Decoder = [ contranspose2d, sigmoid]
parameter = list(Encoder.parameters() ) + list(Decoder.parameters() )
optimizer = optim.Adam(parameter, lr)

# After growth

Encoder = [ conv2d, relu, conv2d, relu ]
Decoder = [ constranspose2d, relu, contranspose2d, sigmoid]

parameter = list(Encoder.parameters() ) + list(Decoder.parameters() ) ?

# Should I grab the new parameters again?

I would try to use optimizer.add_param_group, as a complete re-initialization would remove all running estimates, if your optimizer supports these (e.g. Adam).

I did not know that thank you for the tip.

Edited: For add_param_group I assume you need to loop through the entire network and grab all newly added weights and then feed it into add_param_group is this thinking correct? The documentation says param_group is a dict so I should feed the state_dict() correct?

Sorry, missed your edit.
You could add it with:

optimizer.add_param_group({'params': torch.randn(1, requires_grad=True)})
print(optimizer.param_groups)