DP and Transfer Learning

Andreas_Kopp · February 10, 2021, 4:41pm

Hi Opacus community,
I am looking for experiences / best practices for using DP with transfer learning. Let’s say a hospital decides to build a DP image classification model based on patient data.

3 scenarios com to mind:

baseline network pre-trained on public dataset (from a similar domain, e.g. x-rays)
baseline network pre-trained on existing data of the hospital (classic, non private way)
baseline network pre-trained with DP on existing data of the hospital

I would assume that generally all approaches could make sense because probably less epochs are required compared to training from scratch (hence, spending less privacy budget).

I would appreciate learning your experiences / thoughts about these scenarios.

Thanks
Andreas

karthikprasad · February 10, 2021, 7:26pm

Hello @Andreas_Kopp !
With transfer learning, there are two separate datasets to consider: one used for training the baseline model and another for fine-tuning/training your final model.

In scenarios 1, the privacy of the public dataset is not preserved, but the privacy of the final model is preserved if you train with DP.

In scenario 2, just as in scenario 1, the privacy of the dataset used to train your baseline model is NOT preserved, but only that of the data used to train your final model is. If preserving the data used to train your baseline model is not important for your use case, this might result in better accuracy than that of scenario 1 (depending, ofcourse, on your task as well as the size and distribution of this data compared the public dataset)

In scenario 3, the privacy of all the data is preserved, but the accuracy will be very low.

Hope this adds some clarity.