Hello. I have 2 different processes, each loads a model and waits for an image from a socket to inference. Problem is that each process uses 2GB GPU memory and 4 GB normal memory (4 GPU memory and 8 normal memory in total). But when I load these models to the same process, the process uses 2.1 GPU memory and 4 GB normal memory. Why is this happening? Is there a way to solve this?
Each process will create a new CUDA context and load the libraries to the host, so a memory increase is expected.
I have a question, please help me.