Some questions about (ϵ,δ)-differential privacy guarantee

liuwenshuang0211 · February 5, 2021, 5:22am

Hi all.
I apply the opacus package to Unet training, I don`t know how to judge the privacy guarantee level of Unet. Is it related to the output of ϵ? Does ϵ less than a certain value mean that privacy guarantee is good.

However, ϵ keeps increasing during the training process. Does this mean that the longer the training time, the lower the privacy protection level?

Sincerely looking forward to receiving your reply.

ffuuugor · February 5, 2021, 9:26pm

Hi @liuwenshuang0211.
First of all, thanks for trying out opacus, we do appreciate it.

To better understand the concept of (ε,𝛿) - differential privacy, I suggest starting with FAQ section on our website, we have a paragraph on that: FAQ · Opacus

tl;dr - Epsilon defines the multiplicative difference between two output distributions based on two datasets, which differ in a single example. In other words, epsilon is a measure of how much difference a single example can make under the worst possible circumstances.
Delta, in turn, defines the probability of failure, in which we don’t uphold privacy guarantee defined by epsilon.

Now, the question about threshold values for “good privacy guarantee” is tricky and hard.
For delta you typically want it to be orders of magnitude less than 1/N, where N is the number of entries in your dataset. Otherwise you don’t protect against full release of one entry.

For epsilon, the answer is “it depends”.
I’d suggest to look at papers to see what’s considered reasonable for your task.
For example, on MNIST we’re able to get acceptable results with ε=1.19 and good results with ε=7.
In another example, paper “Tempered Sigmoid Activations for
Deep Learning with Differential Privacy” by Papernot et al uses ε=3 for MNIST and ε=7 for CIFAR10.

There’re also some evidence, that epsilon values reported by DP-SGD are on a very pessimistic end, and the observed privacy is much better (roughly shrinks epsilon by a factor of 10)

To summarise this part, you’d want to set your 𝛿 << 1/N, and to have your ε in single digits (but it depends)

As for the training process, the short answer is yes - the more training iterations you perform, the weaker privacy guarantee is. Bear in mind, however, that the relation is sub-linear.
For more math behind it I would suggest looking into the original DP-SGD paper (https://arxiv.org/pdf/1607.00133.pdf), specifically section 3.1 Differentially Private SGD Algorithm, subsection “Privacy Accounting”

Darktex · February 9, 2021, 12:23am

About losing privacy the longer you train, maybe an example can help internalize this concept.

Let’s leave ML aside for now and let’s just imagine we are building something like Google Maps’s feature that tells you how busy a store is depending on the time of day. To make this privacy-safe, you can use differential privacy and instead of submitting the actual position of a user down to the right centimeter, you instead add some noise and randomize each user’s position within a 3 meter radius. This will not alter your aggregate significantly because if you have many users, you’ll still be able to count correctly as the noise will tend to cancel out over many many users.

Now the problem is that the noise will also cancel out if we keep on asking for your position over and over. This is what happens with DP-SGD too the longer you keep training

touqir · February 16, 2021, 1:24pm

The fact that the more times you release differentially private estimates, the larger the accumulated privacy loss becomes is a common phenomenon in Differential Privacy and is generally attributed due to a very useful property in the differential privacy literature known as composition. Many of the well known differentially private algorithms rely on various composition theorems to actually limit the accumulated privacy loss. If you are interested, you can take a look at this paper : http://proceedings.mlr.press/v37/kairouz15.pdf for more information on composition theorems.