Thanks @KFrank, for looking into it and for your suggestion.
Yes, for now, my scheme of saving the weights and then loading them back is working but it is 8-10 times slower than normal training.
Also, if I use the noisy model to do the forward pass and compute the loss and then use that loss to backpropagate and update my original model.
So my question is can we do that? i.e. to optimize a model based on another model’s output? While the other models closely track the weights of the original model but only adds noise or transform weights somehow in each forward pass.
Best,
Atif