I run my computations on a server cluster where computation jobs have a time limit, but my learning process of multiple epochs typically takes longer than this time limit. Therefore, I regularly store the state of my computations (i.e., after every epoch), and the resume the computations when the job finished and I started a new one.
This especially includes using the .load_state_dict() function of the model and the optimizer.
Now I would like to integrate differential privacy with Opacus into my computations, and I am asking myself whether and how I can store the states of Opacus, and resume it? Specifically, my questions are:
- Can I just torch.save() the PrivacyEngine object?
- If the answer to 1) is no, is there a .state_dict property and .load_state_dict() function for the PrivacyEngine?
- Can I alternatively just run the model.load_state_dict() and optimizer.load_state_dict(), then update these by running privacy_engine.make_private(), and use the objects as normal?
- In the case of 3), can I store the epsilon = privacy_engine.get_epsilon(DELTA) after every epoch (along with the model state and the optimizer state), and add the previous epsilon value to the epsilon value obtained after resuming the calculations? Or is this not an additive relation and I would end up with a wrong epsilon value when resuming the calculations?
Thank you very much in advance!