BatchMemoryManager with Opacus in Lightning

I’ve implemented Opacus with my Lightning training script for an NLP application. I’m having issues with GPU out-of-memory errors that I’m not able to resolve. So, I’m looking to implement BatchMemoryManager to increase my batch size while preserving GPU memory. My question is:

How do I implement BatchMemoryManager with Lightning? Is that supported currently?

My current implementation follows the same skeleton as the Opacus Lightning tutorial here: opacus/examples/ at main · pytorch/opacus · GitHub

Thanks in advance for any help :slight_smile:

I’m thinking it could be implemented with a custom Lightning Loop to activate the BatchMemoryManager, as explained here: Train anything with Lightning custom Loops | by PyTorch Lightning team | PyTorch Lightning Developer Blog

Has anybody done this before?

Hi there, may I ask how did you eventually do this?

I think I have solved this with defining train_dataloader() inside the Lightning module and therefore I can access self.optimizers() with wrap_dataloader() from opacus.