Reduce GPU memory when loading the model for inference

netpcvnn · November 22, 2019, 11:03am

Hi all,

When I load the model for inference. The model always consumes a lot of memory even the model size is small. For example: model size is 70MB (Encoder + Decoder + attention with Resnet 50 as backbone for encoder) but it occupies approximately 1GB GPU memory.
Can we have any ways to reduce the GPU memory when loading the model for inference?

Thank you!

albanD · November 22, 2019, 3:04pm

Hi,

I’m afraid there is not much we can do.
You can check this other topic on the same subject: Moving a tiny model to cuda causes a 2Gb host memory allocation

111137 · November 22, 2019, 5:52pm

Hi,

How about use of float16 instead of float32?, this conversion makes room twice.

Best,
S.Takano

netpcvnn · November 23, 2019, 2:51am

Thanks all for supports.
with Float16, I have used Apex (mixed precision) for training, and the model size is reduced to 50 MB, but when I load the model to GPU, it still consumes 950 MB.
Look likes I need to see more for CUDA context for this issue.