Pytorch high memory demand

What is the most efficient way to reduce memory consumption on cpu?
Less output channel / smaller convolution size?
Using depthwise convolutions?

I am training the network on gpu but want to make predictions on machines which do not have gpus. I can probably make my network smaller. What is best to focus on for improving performance on cpu?