I train my model on
"cuda:0", but I can see that on others GPUs my model also allocated few MBs. Do you know what could happened? Why when I use just one GPU a little bit of memory of others is also used?
I’m not sure if your code tries to initialize a CUDA context on all devices, but you could avoid it by masking all other GPUs via:
CUDA_VISIBLE_DEVICES=0 python script.py args