How to store the state and resume the state of the PrivacyEngine?

Hi,

I run my computations on a server cluster where computation jobs have a time limit, but my learning process of multiple epochs typically takes longer than this time limit. Therefore, I regularly store the state of my computations (i.e., after every epoch), and the resume the computations when the job finished and I started a new one.

This especially includes using the .load_state_dict() function of the model and the optimizer.

Now I would like to integrate differential privacy with Opacus into my computations, and I am asking myself whether and how I can store the states of Opacus, and resume it? Specifically, my questions are:

  1. Can I just torch.save() the PrivacyEngine object?
  2. If the answer to 1) is no, is there a .state_dict property and .load_state_dict() function for the PrivacyEngine?
  3. Can I alternatively just run the model.load_state_dict() and optimizer.load_state_dict(), then update these by running privacy_engine.make_private(), and use the objects as normal?
  4. In the case of 3), can I store the epsilon = privacy_engine.get_epsilon(DELTA) after every epoch (along with the model state and the optimizer state), and add the previous epsilon value to the epsilon value obtained after resuming the calculations? Or is this not an additive relation and I would end up with a wrong epsilon value when resuming the calculations?

Thank you very much in advance!

1 Like

Hi,
Thanks for your question - ability to save/load checkpoints is an important feature, and it’s good for us to have some input on how people could be using it.

While we’re considering how to include this into Opacus API here’s what you need to know to make it work today.

  1. PrivacyEngine doesn’t maintain links to model, optimizer, or data_loader. The only important state maintained by PrivacyEngine is accountant. Accountant’s state is just a list of numerical tuples, so torch.save() or any other pickle mechanism should do. That said, you probably should save the accountant (privacy_engine.accountant), not the privacy engine itself. That’s because we also maintain the link to the dataset used in the first call to the make_private() method - to do a sanity check the dataset is not being swapped in the middle (accounting is performed on per-dataset basis)
  2. While GradSampleModule functionally is just a wrapper around nn.Module, saving/loading probably doesn’t work for them out of the box. I’d say your best bet is to save/load underlying model and then wrap it with GradSampleModule every time you’re restoring from the checkpoint.

To summarise, here are the steps you need to take

On saving:

  1. Save accountant: torch.save(privacy_engine.accountant)
  2. Save model: torch.save(model. _module.state_dict())
  3. If your optimizer has state (e.g. dynamic lr or momentum) - save wrapped optimizer: optimizer.original_optimizer.state_dict()

On loading:

  1. Initialize empty PrivacyEngine
  2. Load accountant and replace brand new with the one you’ve just initialized: privacy_engine.accountant = accountant_you_have_just_loaded_from_checkpoint
  3. Load your non-private nn.Module as normal
  4. Load your non-private optimizer as normal
  5. Pass loaded model and optimzier to privacy_engine.make_private()
3 Likes

Thank you very much Igor for this well-structured answer, I will try it out