Question about Oversampling

Hi All,

For an imbalanced dataset, can we first use oversampling to balance the two classes, and then use Opacus for DP training over the artificially balanced dataset?

Hi @PeterCheng and thanks for your question.

You can certainly do that, but it would affect how you would interpret privacy for the trained model.
Opacus provides privacy guarantees with respect to each individual sample in the dataset. We also use privacy amplification by subsampling - i.e. the epsilon guarantees directly tied to the probability of being sampled for a minibatch.

When you oversample, you essentially increase sampling rate for that particular data record.
To get an eps estimate for oversampled instances, you need to modify PrivacyEngine.get_privacy_spent method and use adjusted sample rate

Thank you for the detailed explanation. I have a better understanding now.