High GPU memory usage problem

@albanD, your explanation was very helpful and timely. Thank you!

Is there anything that can be done to trade off that inefficient setup phase that leads to a huge at times peak memory usage, but not affect the normal training (i.e. not reducing bs, model sizes, etc.)?

For example, MNIST (28x28) w/ bs=512 and resnet50 peaks at 6GB on the first pass, and then goes down to a steady 1GB peak for subsequent epochs - 6 vs 1 is an insanely huge overhead!

Can the computational graph/gradients’ setup be done in stages and then combined to keep the peak memory usage at a smaller multiple?

2 Likes