I have a workstation with 3 gpus. Whenever I run a model on the device other than gpu:0 (say gpu:1), the model still allocates some additional memories on gpu:0 (from my observation, it varies between 600M to 900M, which seems depend on the model I am training). I call it additional memories because when I run the same model on the gpu:0, this size of the memories won’t be allocated.
The model can run, but the problem is a little annoying. Does anyone knows what’s going on here.
Initialization depends on gpu’s model, some need more some less.
I don’t get it. If the memory is used for intialization, the memory should be release as soon as the initialization process complete, right? Or, I should manually release the memory by some hack?
It’s something that CUDA needs to work. It is not like a 1st step which is realeased rather than loading packages do make that specific GPU to work.
As I aforementioned it depends on the model but that behavior is okay.
Alright, it seems like a wired behavior for me, because the model don’t need the memory at all after initiallization. Is there any API I can use to manually release the memory?
It’s not a pytorch issue rather than nvidia’s. I don’t think they would waste memory since it’s a widely stable and developed librabry but your question is out of my knowledge. @ptrblck may help you as he is from nvidia I think.
Thanks! I think maybe I should not be too worry about it.
It’s okay Being curious is an excellent way to learn, if you discover it write me back!
Based on the size, the CUDA context seems to be initialized on GPU0 (and I thought we got rid of this, but cannot find the issue on GitHub).
Anyway, if you only want to use a specific device, you execute your script via
CUDA_VISIBLE_DEVICES=1,2 python script.py
to mask all other devices.
Note that internally in your script, the GPUs will be remapped starting at index0.