RuntimeError("No GPUs available.")

Hi,
I was running my code on 8 gpus, and changing my code related to reading data. Suddenly after few runs I’ve stared getting this error:

File “~anaconda3/envs/_new/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py”, line 348, in insert_env_defaults
return fn(self, **kwargs)
File “~/anaconda3/envs/_new/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py”, line 419, in init
self._accelerator_connector = AcceleratorConnector(
File “~/anaconda3/envs/_new/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py”, line 205, in init
self._set_parallel_devices_and_init_accelerator()
File “~/anaconda3/envs/_new/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py”, line 566, in _set_parallel_devices_and_init_accelerator
self._set_devices_flag_if_auto_select_gpus_passed()
File “~/anaconda3/envs/_new/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py”, line 583, in _set_devices_flag_if_auto_select_gpus_passed
self._devices_flag = pick_multiple_gpus(
File “~/anaconda3/envs/_new/lib/python3.10/site-packages/pytorch_lightning/tuner/auto_gpu_select.py”, line 54, in pick_multiple_gpus
picked.append(pick_single_gpu(exclude_gpus=picked, _show_deprecation=False))
File “~/anaconda3/envs/_new/lib/python3.10/site-packages/pytorch_lightning/tuner/auto_gpu_select.py”, line 96, in pick_single_gpu
raise RuntimeError(“No GPUs available.”)

though the gpus are available and work for other code.
Could you please let me how I can fix it?
Thanks!

What is the output of “torch.cuda.is_available()” in your case? Have you started training yet?

Previously it was True, but for some reason it returned False, but I learned that the issue is the CUDA_VISIBLE_DEVICES, after I set it to 1, it could recognize one of the gpus:

export CUDA_VISIBLE_DEVICES=1

However, when I set to all gups: ```
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

getting the same error:
lib/python3.10/site-packages/pytorch_lightning/tuner/auto_gpu_select.py", line 96, in pick_single_gpu
    raise RuntimeError("No GPUs available.")
RuntimeError: No GPUs available.

How did you start your training job on 8GPUs?

There are 8 16G GPUS:
import pytorch_lightning as pl

trainer = pl.Trainer(
accelerator=“auto”,
auto_select_gpus=True,
callbacks=callbacks,
devices=_get_devices(),
max_epochs=config[“max_epochs”],
strategy=_get_strategy(),
)

trainer.fit(
model, train_loader.train_dataloader(), val_loader.val_dataloader()
)