I am trying to build a progressive autoencoder, for the encoder part I would like to prepend more layer on top of a ModuleList, how can I achieve that? I want to avoid copying and remaking my model every time I grow a layer.
eg:
# BEFORE GROWTH
self.layers = nn.ModuleList( [ conv2d( 256, 512 ), nn.ReLu() ] )
forward(self, x ):
for layer in self.layers.children():
x = layer( x )
return x
** PREPEND module to the top of the encoder... **
# AFTER GROWTH
self.layers = nn.ModuleList( [ conv2d( 128, 256 ), nn.ReLu(), conv2d( 256, 512 ), nn.ReLu() ] )
forward(self, x ):
for layer in self.layers.children():
x = layer( x )
return x
I have a working prototype model already but I was constantly destroying and making new nn.sequential which isn’t efficient at all.
Hi prtblck I think there was some mistake in my original code, I didn’t do it your way, I actually made another variable to store both old and new values first then I unpacked everything into a nn.Sequential() like what you did. I can see how that is unnecessary and takes up extra memory. Btw the code above does it make a shadow copy of the original list? Sorry if this sounds like a beginner python question.
Follow up question, do I have to make a new parameter list to feed it into my optimizer every time I increase my network’s complexity? Does Pytorch keep a track of that?
Eg:
# Before growth
Encoder = [ conv2d, relu ]
Decoder = [ contranspose2d, sigmoid]
parameter = list(Encoder.parameters() ) + list(Decoder.parameters() )
optimizer = optim.Adam(parameter, lr)
# After growth
Encoder = [ conv2d, relu, conv2d, relu ]
Decoder = [ constranspose2d, relu, contranspose2d, sigmoid]
parameter = list(Encoder.parameters() ) + list(Decoder.parameters() ) ?
# Should I grab the new parameters again?
I would try to use optimizer.add_param_group, as a complete re-initialization would remove all running estimates, if your optimizer supports these (e.g. Adam).
Edited: For add_param_group I assume you need to loop through the entire network and grab all newly added weights and then feed it into add_param_group is this thinking correct? The documentation says param_group is a dict so I should feed the state_dict() correct?