How to accumulate gradients with Opacus

deponce · October 23, 2021, 4:38am

Hey everyone, I attempt to accumulate gradients in training to save GPU memory.

The training loop works quite well without privacy_engine

opt.zero_grad()
for i, (input, target) in enumerate(dataset):
    pred = net(input)
    loss = crit(pred, target)
    # one graph is created here
    loss.backward()
    # graph is cleared here
    if (i+1)%10 == 0:
        # every 10 iterations of batches of size 10
        opt.step()
        opt.zero_grad()

However, when I attach the privacy_engine to the optimizer, CUDA is out of memory.
Do anyone know how to solve this problem? Thanks in advance.

ffuuugor · November 2, 2021, 7:43pm

Hi!
It is expected that Opacus has a certain memory overhear. At the very least, we have to store per-sample gradients for all model parameters - that alone increases the memory required to store gradient by the factor of batch size.

In order to address that I suggest using virtual_step() method in the optimizer.
It does gradient clipping and accumulation (thus saving memory), but doesn’t do the actual optimizer step.
Your code would look smth like this:

opt.zero_grad()
for i, (input, target) in enumerate(dataset):
    pred = net(input)
    loss = crit(pred, target)
    # one graph is created here
    loss.backward()
    # graph is cleared here
    if (i+1)%10 == 0:
        # every 10 iterations of batches of size 10
        opt.step()
        opt.zero_grad()
    else:
        opt.virtual_step()

For more examples using virtual_step see our Cifar10 tutorial: opacus/cifar10.py at main · pytorch/opacus · GitHub