I have 2 GPUs,
when I want to use one of GPUs to train, with the following code, both work.
CUDA_VISIBLE_DEVICES=0 python xxx.py,
CUDA_VISIBLE_DEVICES=1 python xxx.py,
However, when I want to use 2 GPUs to train, with the following code,
CUDA_VISIBLE_DEVICES=0,1 python xxx.py,
it doesn’t work anymore. Only the default GPU:0 is used for training, when the memory of GPU:0 run out of, the training will be terminated with error ‘out of memory’. The GPU:1 is lying idle and not be used. Why?
the GPU information is showed in the following:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00006B71:00:00.0 Off | 0 |
| N/A 54C P0 82W / 149W | 8772MiB / 11441MiB | 40% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K80 Off | 000096F1:00:00.0 Off | 0 |
| N/A 25C P8 32W / 149W | 11MiB / 11441MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2495 C python 8759MiB |
±----------------------------------------------------------------------------+
Could someone explain this situation? and What should I do to love that?
Thanks in advance and really appreciate for any feedback.