Reduce the GPU memory use during inference

KaiHoo · April 11, 2021, 2:09am

Hello, are there any methods to reduce the GPU memory use during inference? I have already used torch.no_grad. Thanks!

KaiHoo · April 11, 2021, 2:38am

I tried to fuse batch norm and convolutions, but it has very limited reduction. In theory, we can forward one layer and free the GPU use, but I do not know how to do this.

EDIT: I made a mistake. This is indeed one possible solution