Hello, are there any methods to reduce the GPU memory use during inference? I have already used torch.no_grad. Thanks!
I tried to fuse batch norm and convolutions, but it has very limited reduction. In theory, we can forward one layer and free the GPU use, but I do not know how to do this.
EDIT: I made a mistake. This is indeed one possible solution