Arbitrary batch sizes after passing dataloader through privacy engine

Anirban_Nath · September 18, 2022, 1:19pm

In my code, I have defined privacy engine as follows: -

unet, optimizer, trainloader = privacy_engine.make_private_with_epsilon(
        module = unet,
        data_loader = trainloader,
        optimizer = optimizer,
        epochs = global_epochs * local_epochs,
        target_epsilon = 1.0,
        batch_first = True,
        target_delta = PRIVACY_PARAMS['target_delta'],
        max_grad_norm = PRIVACY_PARAMS['max_grad_norm'])

I sampled a batch from the dataloader before and after passing through the privacy engine and every time I execute it, the wrapped dataloader shows some arbitrary batch size. For example, if my original dataloader was made with batch size =6, the wrapped dataloader shows 7, 10, etc, which changes with every execution. Why does this happen?

karthikprasad · September 19, 2022, 11:12pm

Hello @Anirban_Nath,
This is a consequence of using Poisson sampling that is needed for a Differentially Private Data Loader. This has been called out in the doc strings here (Opacus · Train PyTorch models with Differential Privacy). I’ll make sure this also captured in our FAQs at FAQ · Opacus

[Tracking the update to documentation at Add an FAQ about variable batch size · Issue #514 · pytorch/opacus · GitHub]

Anirban_Nath · September 20, 2022, 8:20am

Hello @karthikprasad

So I figured that out, thanks to you and I also found out that there is a module named BatchMemoryManager that helps to keep maximum batch size in check. However, with the two in place, when I am trying to execute my code, I keep getting a “ValueError: Per sample gradient is not initialized. Not updated in backward pass?” error at loss.backward().

Prior to that, there is a warning when I pass a batch through my model which says “Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior.”

Maybe the two are correlated?

Here is a snippet of my code for your reference: -

privacy_engine = PrivacyEngine()
model , optimizer, trainloader = privacy_engine.make_private_with_epsilon(
        module = unet,
        data_loader = trainloader,
        optimizer = optimizer,
        epochs = 200,
        target_epsilon = 1.0,
        batch_first = True,
        target_delta = PRIVACY_PARAMS['target_delta'],
        max_grad_norm = PRIVACY_PARAMS['max_grad_norm'])

print(f"Training local epochs")
for l in range(local_epochs):
    with BatchMemoryManager(data_loader=trainloader, max_physical_batch_size=4, optimizer=optimizer) as new_data_loader:   
    for i_batch, sampled_batch in enumerate(new_data_loader):

        image_batch, segmask = sampled_batch['image'], sampled_batch['mask']
        image_batch, segmask  = image_batch.to(device), segmask.to(device)

        #Segmentation
        outs_seg = unet(image_batch, task = 'segment') #I get warning here
        softmax = torch.nn.functional.log_softmax(outs_seg, dim=1)
        loss_seg = torch.nn.functional.nll_loss(softmax, segmask[:].long())

        optimizer.zero_grad()
        loss.backward() #I get the error here
        optimizer.step()

karthikprasad · September 29, 2022, 4:58pm

Hello @Anirban_Nath ,
I’m glad the previous issue is resolved.

As for your new issue, the warning can be ignored. The error essentially indicates that the model being trained has layers that are not valid. Your code snippet suggests that you are using unet instead of model? That doesn’t look right.

If you are still running into an issue, could you please make a separate post with some reproducible code?

Anirban_Nath · October 3, 2022, 7:10am

Hi. I have already created a separate issue for this but for my own knowledge, what do you mean when you say that my model has layers that are not “valid”? And unet is simply my variable name for model, so I don’t think there are any issues there.