Reduce the GPU memory use during inference

Hello, are there any methods to reduce the GPU memory use during inference? I have already used torch.no_grad. Thanks!

I tried to fuse batch norm and convolutions, but it has very limited reduction. In theory, we can forward one layer and free the GPU use, but I do not know how to do this.

EDIT: I made a mistake. This is indeed one possible solution