CPU memory cost during training on GPU

Hello, everyone, I have a question, when we training on GPU, also there are some memory cost on CPU.
anyone can explain what is CPU memory cost. image buffer? ops? is there some ops not computing on GPU.
Thank you!

There is some memory used by the cuda driver/manager.
Tensor metadatas are also stored on CPU memory but these should be very small.
Your dataloading uses CPU memory when loading stuff from disk: anything that is loaded from disk has to be in CPU memory before being sent on the GPU.

Do you see some abnormal CPU memory usage?

yes, as I test tensorflow timeline tools, except input image loader, almost all jobs are done by gpu.
I am not sure whether pytorch has similar policy with tensorflow. if similar policy seems no huge cpu memory usage, as I think if I just do one image inference(640x360), but when I print memory using gc it print >1G memory used. I am confused for that.
in additional, do you know how to control memory(reduce memory usage) during inference?

If your Tensors are on GPU, all operation will happen on the GPU.
Note that it is cuda is known to have a high initial memory usage (it’s tracked here).
It is simple to get minimal memory usage during inference by wrapping your inference code into a with torch.no_grad(): context.