Multiple GPU with os CUDA_VISIBLE_DEVICES does not work

Hi, I’ve tried to set CUDA_VISIBLE_DEVICES = ‘1’ in main function but when I move the model to cuda, It does not move to GPU1 but GPU0 instead (result in OOM due to GPU0 is in use). Please tell me if I’m wrong.
here is my code:

def main(config_file_path):
    config = SspmYamlConfig(config_file_path)

    dataloader_cfg = config.get_dataloader_cfg()
    trainer_cfg = config.get_trainer_cfg()
    logger_cfg = config.get_logger_cfg()
    model_cfg = config.get_model_cfg()
    pose_dataset_cfg = config.get_pose_dataset_cfg()
    data_augmentation_cfg = config.get_augmentation_cfg()
    target_generator_cfg = config.get_target_generator_cfg()

    learning_rate = trainer_cfg['optimizer']['learning_rate']
    # parsing device = [1] by config
    device = ','.join(list(map(str, trainer_cfg['device'])))
    os.environ['CUDA_DEVICE_ORDER']= 'PCI_BUS_ID'
    os.environ['CUDA_VISIBLE_DEVICES'] = device 
    model = getModel(model_cfg)
    train_loader = DataLoader(train_dataset, **dataloader_cfg['train'])
    val_loader = DataLoader(val_dataset, **dataloader_cfg['val'])
    trainer = Trainer(
                      model, optimizer, logger,
                      writer, config, train_loader, val_loader

Trainer class is inherited from BaseTrainer where the model was transferd to cuda

class BaseTrainer(ABC):
    def __init__(self, model, optimizer, logger, writer, config):
        self.config = config
        self.logger = logger
        self.writer = writer
        self.optimizer = optimizer
        self.trainer_config = config.get_trainer_cfg()
        self.device_list = self.trainer_config['device'] #device list is [1]
        self.device_type = self._check_gpu(self.device_list)
        self.device = torch.device(self.device_type)
        self.model = model
        self.model =
        self.model = torch.nn.DataParallel(self.model)
    def _check_gpu(self, gpus):
        if len(gpus) > 0 and torch.cuda.is_available():
            for i in gpus:
                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
                meminfo = pynvml.nvmlDeviceGetMemoryInfo(handle)
                memused = meminfo.used / 1024 / 1024
      'GPU{} used: {}M'.format(i, memused))
                if memused > 1000:
                    raise ValueError('GPU{} is occupied!'.format(i))
            return 'cuda'
  'Using CPU!')
            return 'cpu'

If you are masking devices via CUDA_VISIBLE_DEVICES all visible devices will be mapped to device ids in the range [0, nb_visible_devices].
E.g. if your system has two GPUs and you are using CUDA_VISIBLE_DEVICES=1, you would have to access it inside the script as cuda:0.

1 Like

thank you for your quick reply. but I have a question:
I have 3 GPUs, when I want to use only GPU1 and 2 (GPU0 is in use). how should I do?

1 Like

If all devices are the same, use
CUDA_VISIBLE_DEVICES=1,2 python args
to run the script and inside the script use cuda:0 and cuda:1 (or the equivalent .cuda(0), .cuda(1) commands).

However, if the mapping is not what you expect via nvidia-smi, you could force the PCI bus order order via CUDA_DEVICE_ORDER=PCI_BUS_ID in front of the aforementioned command.

1 Like

I’ve found that I need to set VISIBLE device at the begining of my script. I’s my mistake, thank you for your help. I will close this topic