Can PyTorch use stateful optimizers for a dynamic amount of weights?


I like to implement the following project in PyTorch:

I already started to implement some simple networks (which is quite elegant in PyTorch).

The linked project uses a dynamic amount of layers and weights. If required, the model adds new layers. Is it possible to optimize such a newtork with a dynamic amount of layers / weights with stateful optimizers (e.g. Adam, Adagrad, etc.) with PyTorch without loosing the advantages of the internal state?

Looking at the source for Adam, it loops over all parameter sets in all parameter groups, and does its calculations separately for each parameter set.

    for group in self.param_groups:
        for p in group['params']:
            state = self.state[p]

            # State initialization
            if len(state) == 0:

            # do calculations

So when you add a new parameter group to the optimiser using optimizer.add_param_group(({'name':optional, 'params':new_module.parameters() })) the old parameters will keep their existing optimiser state, and the new parameters optimiser state will be initialised correctly.

Other optimisers should work similarly.

Thanks for the great answer:)!

