Hey everyone, I attempt to accumulate gradients in training to save GPU memory.
The training loop works quite well without privacy_engine
opt.zero_grad()
for i, (input, target) in enumerate(dataset):
pred = net(input)
loss = crit(pred, target)
# one graph is created here
loss.backward()
# graph is cleared here
if (i+1)%10 == 0:
# every 10 iterations of batches of size 10
opt.step()
opt.zero_grad()
However, when I attach the privacy_engine to the optimizer, CUDA is out of memory.
Do anyone know how to solve this problem? Thanks in advance.
Hi!
It is expected that Opacus has a certain memory overhear. At the very least, we have to store per-sample gradients for all model parameters - that alone increases the memory required to store gradient by the factor of batch size.
In order to address that I suggest using virtual_step() method in the optimizer.
It does gradient clipping and accumulation (thus saving memory), but doesn’t do the actual optimizer step.
Your code would look smth like this:
opt.zero_grad()
for i, (input, target) in enumerate(dataset):
pred = net(input)
loss = crit(pred, target)
# one graph is created here
loss.backward()
# graph is cleared here
if (i+1)%10 == 0:
# every 10 iterations of batches of size 10
opt.step()
opt.zero_grad()
else:
opt.virtual_step()