Hi, I compared two tests:
resnet20 on cifar10 with privacy-engine, the clipping norm is set to 10M. This should be equivalent to not doing clipping at all.
resnet20 on cifar10 without privacy-engine (noise-multiplier is set as 0), with exactly the same parameters as example 1.
Test 2 soon reached 92% accuracy while test 1 struggled to reach 85%. I then wrote another test where we created two models for the two above tests and made them train on the same data (with the same trainloader) simultaneously. Model 1 has optimizer 1 which is attached to a privacy-engine, while model 2 has optimizer 2 which is just a normal SGD optimizer.
The code looks something like this:
Before we call the step() functions, the param.grad are exactly the same between the two models. However, after we called the step() functions, there are approximately a 3% difference between the param.grad of the two models.
Is this because pytorch’s default way of computing gradient is different from opacus even when the clipping value is 10 million? Or is it because of accuracy loss during opacus computations?