When I use one cuda the momery of another is also used

I train my model on "cuda:0", but I can see that on others GPUs my model also allocated few MBs. Do you know what could happened? Why when I use just one GPU a little bit of memory of others is also used?

I’m not sure if your code tries to initialize a CUDA context on all devices, but you could avoid it by masking all other GPUs via:

CUDA_VISIBLE_DEVICES=0 python script.py args