I have questions about training networks progressively in pytorch

The simplest way to do it is to train certain layers while have the other layers act like the identity function. You can select what parameters you want to train when making your optimizer (https://pytorch.org/docs/stable/optim.html#per-parameter-options). To speed things up, you can avoid computing gradients for the modules that you don’t train (https://pytorch.org/docs/stable/notes/autograd.html#excluding-subgraphs-from-backward).

Dynamically adding/removing modules is relatively easy in Tensorflow/Keras, since a graph of the model is available on the python side. In PyTorch, you cannot traverse the graph of your model to insert modules.

What you could do is re-instantiate a model with more layers, but then you will have trouble loading the state_dict. (It’s still doable; you can first get the state_dict of the new bigger model, and copy the available key/values from the state_dict of the smaller model. Then you can load the mutated state_dict in the new model. That’s what I myself do for replacing modules inside a network.)

Good luck!