First of all, thanks for trying out opacus, we do appreciate it.
To better understand the concept of (ε,𝛿) - differential privacy, I suggest starting with FAQ section on our website, we have a paragraph on that: FAQ · Opacus
tl;dr - Epsilon defines the multiplicative difference between two output distributions based on two datasets, which differ in a single example. In other words, epsilon is a measure of how much difference a single example can make under the worst possible circumstances.
Delta, in turn, defines the probability of failure, in which we don’t uphold privacy guarantee defined by epsilon.
Now, the question about threshold values for “good privacy guarantee” is tricky and hard.
For delta you typically want it to be orders of magnitude less than 1/N, where N is the number of entries in your dataset. Otherwise you don’t protect against full release of one entry.
For epsilon, the answer is “it depends”.
I’d suggest to look at papers to see what’s considered reasonable for your task.
For example, on MNIST we’re able to get acceptable results with ε=1.19 and good results with ε=7.
In another example, paper “Tempered Sigmoid Activations for
Deep Learning with Differential Privacy” by Papernot et al uses ε=3 for MNIST and ε=7 for CIFAR10.
There’re also some evidence, that epsilon values reported by DP-SGD are on a very pessimistic end, and the observed privacy is much better (roughly shrinks epsilon by a factor of 10)
To summarise this part, you’d want to set your 𝛿 << 1/N, and to have your ε in single digits (but it depends)
As for the training process, the short answer is yes - the more training iterations you perform, the weaker privacy guarantee is. Bear in mind, however, that the relation is sub-linear.
For more math behind it I would suggest looking into the original DP-SGD paper (https://arxiv.org/pdf/1607.00133.pdf), specifically section 3.1 Differentially Private SGD Algorithm, subsection “Privacy Accounting”