I’ve written code that adds hidden neurons to each hidden layer during training. I’m using the Adam optimizer which maintains a learning-rate for each parameter. At the moment I create a new optimizer with the new model parameters whenever nodes are added to the hidden layers. By doing this I lose the learning-rates for each parameter that was present before the addition of neurons.
How do I keep the same learning-rates per parameter? Is the correct way of doing this adding rows and columns of zeros to the parameters, exp_avg_sq and exp_avg or is there an easier way?
There are no new modules. The neurons added are to existing layers, not new layers. When I print the network parameters they show up in the same parameter group.