DP-SGD using Opacus

NBu · September 7, 2021, 9:12am

Hi all,

I have followed tutorials regards DP Image Classification using Resnet18.
I have some questions:

When a model has many layers, it wasn’t able to convergence under DP. Are there any recommended approaches to overcome this problem for large models with many Fully connected layers?
When I decreased batch_size using the same model (due to memory size 8GB), the loss goes up to 80-200. That’s mean it is challenging to make a large batch with DP. Large batch size sometimes helps to improve model accuracy.
With trying different DL models, training non-private model takes a reasonable time (e.g., 6 minutes) compared to the private model (19 minutes) with lower accuracy.

Is DP-SGD very slow due to per-sample gradients? Is there any way to speed up processing time?

Can we achieve a comparable accuracy with the baseline model under a modest privacy budget?

Is the DP-Deep Learning model more sensitive to hyperparameters like batch size and noise level or the structure of NN?

Thanks,

karthikprasad · October 5, 2021, 8:36pm

Hello @NBu,
I really don’t know how this post went unattended. Sorry for the delay.

When a model has many layers, it wasn’t able to convergence under DP. Are there any recommended approaches to overcome this problem for large models with many Fully connected layers?

Do you mind sharing an example notebook of this? Hyper-parameter tuning should typically help with these cases. Some tips: FAQ · Opacus

When I decreased batch_size using the same model (due to memory size 8GB), the loss goes up to 80-200. That’s mean it is challenging to make a large batch with DP. Large batch size sometimes helps to improve model accuracy.

The memory requirement is an unfortunate consequence of maintaining per-sample gradients. Opacus provides a concept of virtual_batch to overcome this issue. Please try it out if you haven’t already. FAQ · Opacus

Is DP-SGD very slow due to per-sample gradients? Is there any way to speed up processing time?

There is some overhead to computing per-sample gradients. The speed degradation depends on the layers as well (eg Linear vs LSTM). We can take a look at your notebook to make some suggestions.

Can we achieve a comparable accuracy with the baseline model under a modest privacy budget?

Depends a lot on the task. As you can see on https://opacus.ai, the answer is a resounding yes for MNIST with a small network. However, there is a ~20pp gap in accuracy for CIFAR-10 with ResNet18.

Bridging this gap is an active area of research, and we whole heartedly welcome your ideas and contributions. One of the main goals of open-sourcing opacus is to help push this research.