CPU Memory Allocation/Virtual Memory Allocation per GPU

Is there some way to reduce the CPU memory allocation on init of torch?

When we run torch.is_available() it allocates 11GB for one GPU and 44.2GB when we use six GPUs.

Then when we start the workers in the training loop that CPU allocation is copied to each worker, so we see this massive memory use.

The 11GB is much bigger than the model + n x workers * batch size we roughly expect things to be.

Sorry if this is dead obvious but we found 3 open issues on this and no obvious smoking gun of what we might be doing wrong or is 11GB of CPU memory the expected overhead of using a GPU?


P.S. Some small technical details.
Using the current Nvidia Docker image for Pytorch
(FROM nvcr.io/nvidia/pytorch:22.08-py3)
Model and Training loop is Detectron2 Masked RCNN

Is this the correct area to post this? Should I move the question somewhere else?

No, that’s the right place. Sorry for not responding before as I’ve seen this post but it dropped from my list. Will get back to you after checking with our CUDA team.