I’ve implemented Opacus with my Lightning training script for an NLP application. I’m having issues with GPU out-of-memory errors that I’m not able to resolve. So, I’m looking to implement BatchMemoryManager to increase my batch size while preserving GPU memory. My question is:
How do I implement BatchMemoryManager with Lightning? Is that supported currently?
My current implementation follows the same skeleton as the Opacus Lightning tutorial here: opacus/examples/mnist_lightning.py at main · pytorch/opacus · GitHub
Thanks in advance for any help