Noise addition to weights in a Federated Learning setting

I am planning to use Opacus to implement differential privacy in my federated learning model but I have a very basic doubt that I would love to have cleared before that.

So as far as my understanding goes, using Opacus, we use an optimizer like DPSGD that adds differential noise to each batch of each client’s dataset while they are in “local training”. And in federated learning, we train client models for a few “local epochs” before sending their weights out to a central server for aggregation, and we add differential noise before sending out the model weights.

So my question is, why do we use DPSGD to add noise to every single batch of every single client dataset during local training when we could just add noise to the local weights before they are sent out? Why do we not let the local training epochs happen as is and simply add noise to the outbound weights at time of departure? What am I missing?

Hi @Anirban_Nath

So as far as my understanding goes, using Opacus, we use an optimizer like DPSGD that adds differential noise to each batch of each client’s dataset while they are in “local training”

That really depends on how exactly do you plug in Opacus into your FL setup.
Opacus is designed for the central-DP model with server-side training, and provides sample-level privacy guarantees - that’s why the noise is added to every batch

With FL threat model, you can absolutely do the clipping and noise addition on a client level instead. You would need to compute gradients for each client, clip the client overall gradient. Where you add noise (for every client of batch of clients) depends on your exact threat model too

Hi @ffuuugor

Where you add noise (for every client of batch of clients) depends on your exact threat model too

So my working idea of differential privacy in the context of Federated Learning is that we add noise to the outbound weights. If that is my intention, how do I program Opacus exactly? Do I use DPSGD to add noise during every batch of local training for every user or should I add noise only when weights are sent from client to server?

First, I want to correct one small yet important detail. With DP-SGD we always work with gradients, not weights. It doesn’t really matter when we’re adding noise, but it’s important when we do clipping.

The answer to your question depends on what do you want to achieve and what definition of privacy do you want to adopt. In the classic DP-SGD, assuming server-side training, the formal threat model is adversary having access to every gradient update, i.e. each batch gradient. That’s why we add noise on every training step.

If you want to keep this definition and adopt sample-level privacy, you want to do clipping and noise on every local batch.

If you want to go for a user-level privacy and you assume trusted aggregator, you would need to do the following:

  1. Calculate each client’s accumulated gradients over few local epochs
  2. Clip each client’s gradients
  3. Add noise once per round

If you can’t trust the aggregator, you can also add noise to every client’s gradients, but I’m not sure how the privacy accounting would work in this case.
I also suggest you take a look at this paper on distributed noise generation - somewhat intermediate model between the two I described above

Sorry if this makes things more confusing for you :slight_smile:

Hi. So I wrote some code as I thought fit and I am going for client-side privacy, i.e. assuming a trusted aggregator. So what I did is I created two copies of the base model, each with its own optimizer and dataloader. I then passed each (model, optimizer, dataloader) trio through the make_private_with_epsilon function and saved the privatized outputs as .pth files for future use.

I am not using Flower or any such aggregator for enforcing DP. What I do is I load each saved model one after another, run each for a few local epochs and save the trained model and optimizer states back to .pth files. After both models have finished running for a few local epochs, I average their weights as per vanilla Federated Averaging. and initialize both models with these averaged weights at the start of the next global epoch.

However, I have one minor issue - after training for a fixed number of local epochs and before federated averaging, I am attempting to check how much privacy budget has been spent per model using the get_epsilon function (my delta value is 1e-5 and epsilon is 1.0) but every time, I am getting 0 as the output. Is this normal or is there something wrong with my pipeline?

This doesn’t sound right. After applying at least one step you should get a non-zero epsilon. Could you share a code snippet for us to investigate this? Make sure you use the optimizer returned from the make_private() call.

unet = ModuleValidator.fix(unet).to(device)
optimizer = torch.optim.Adam(unet.parameters(), lr=0.0005)

PRIVACY_PARAMS = {
    'target_delta': 1e-5,
    'noise_multiplier': 0.4,
    'max_grad_norm': 1.2
}

privacy_engine = PrivacyEngine()
PE = {}
if not os.path.exists('basemodels/basemodel-u1.pth'):
    for cid in range(num_users):
        PE[cid] = privacy_engine.make_private_with_epsilon(
            module = copy.deepcopy(unet),
            data_loader = trainloader[cid],
            optimizer = copy.deepcopy(optimizer),
            epochs = global_epochs * local_epochs,
            target_epsilon = 1.0,
            batch_first = True,
            target_delta = PRIVACY_PARAMS['target_delta'],
            max_grad_norm = PRIVACY_PARAMS['max_grad_norm'],
            # noise_multiplier = PRIVACY_PARAMS['noise_multiplier']
        )
        torch.save({'model_state_dict':PE[cid][0].state_dict(), 'optimizer_state_dict':PE[cid][1].state_dict()}, f'basemodels/basemodel-u{cid+1}.pth')
        trainloader[cid] = PE[cid][2]

So here, the models and optimizers reside within the dictionary PE and the dataloaders are in a separate dictionary. @Peter_Romov

Hi mate, thanks for mention this, do you have a tutorial about how to average these pth files?like you mentioned saving trained model and to .pth files and do something like federated average. I’m looking the anwers for doing this because sometimes flower is un stable.