I wanted to train a model up to say K iterations. After K iterations I wanted to essentially “freeze” that model and keep train say 10 different version of it (which are independent so in theory they could be ran in parallel). The way I continue the training is not as relevant as much compared to the fact that I don’t want the models to accidentally interfere with each other (since backprop collects gradient of ops in pytorch). I don’t want say the replicating and the different independent continuation with training to interfere with each other.
In a different framework (say matlab or numpy), it should be easy since one could just replicate the matrices and then continue to apply the gradient descent update. Since in matlab there is nothing tying one matrix to another there should be no issue.
Is there a way to do this? Something like the following pseudocode:
mdl = train_with_SGD_for(K_iterations)
## continue training but from where the K_iterations left off)
for i in range(10):
new_mdl = create_decoupled_mdl_copy(mdl)
## get the new training alg written in pytorch manually or new object function (for example)
new_train_update = new_train_procedure_or_objective()
trained_new_mdl = train_to_completion(new_mdl,new_train_update)