Opacus: How to disable backward hook temporally for multiple backward pass

Hello,

I’m using Opacus for computing the per-sample gradient w.r.t the parameter. However, I also need to compute per-sample gradient of each logit w.r.t the input. Therefore I need to do back-propagation several times. A minimal example is as follows

import torch
from opacus.grad_sample import GradSampleModule
from torch.autograd import grad

class Model(torch.nn.Module):
    def __init__(self, inputSize, outputSize):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(inputSize, outputSize)

    def forward(self, x):
        out = self.linear(x)
        return out

num_classes = 2
bs = 16

model = Model(10,num_classes) 
op_model = GradSampleModule(model)

op_model.zero_grad()
X = torch.rand(bs,10)
X.requires_grad = True
grad_x = torch.zeros(bs,num_classes,10)

output = op_model(X) # bs * num_classes
for c in range(num_classes):
    grad_x[:,c,:] = grad(outputs=output[:,c], inputs=X,\
                         grad_outputs=torch.ones_like(output[:,c]),retain_graph=True)[0]

The error shows

IndexError: pop from empty list

It seems that Opacus won’t work if the number of backprop is greater than the number of forward pass, according to this post. However, in my use case (adversarial training) at some point I need to do backprop several times to compute input gradient (not parameter’s gradient). I wonder if it is possible to disable the grad_sample functionality temporarily, and enable it afterwards. I appreciate any suggestions on this.

Thanks!

Hi @Yuancheng_Xu!

The hook for computing grad_sample doesn’t work correctly with more than one backward pass.

I’m not sure I completely understand your use case, but I think you can do the following:

# initialize the vanilla model
model = Model(10,num_classes)
model.zero_grad()
X = torch.rand(bs,10)
X.requires_grad = True
grad_x = torch.zeros(bs,num_classes,10)

# compute the output and doutput/dinput
output = model(X) # bs * num_classes
for c in range(num_classes):
    grad_x[:,c,:] = grad(
        outputs=output[:,c],
        inputs=X,
        grad_outputs=torch.ones_like(output[:,c]),
        retain_graph=True
    )[0]

# add the hook & evaluate the model one more time to get the grad_sample
dp_model = GradSampleModule(model)
output = dp_model(X)

# ... apply gradient step with grad_sample

# remove hooks before the next iteration
# if you don't remove hooks, model(data) will be computed with grad_samples leading to the exception
# also you won't be able to recreate GradSampleModule(model) before removing hooks
dp_model.remove_hooks() 

Note that using grads without properly added noise like in this case could break DP guarantees if you expected any.

1 Like

Thanks for the reply!

My use case is that I need to compute a) grad_sample b) multiple backdrop for gradient of inputs alternatively. I have figured out the way to do it: ddp_model.module.enable_hooks() before a) and ddp_model.module.disable_hooks() before b). It works fine for me.

1 Like