CUDA Initialization Issue and Proposed Workaround with PyTorch in wsl2

Dear PyTorch and NVIDIA teams,

I am writing to report an unexpected behavior I’ve encountered when working with PyTorch and CUDA on a wsl2 on Windows 11 system equipped with multiple NVIDIA RTX 3090 GPUs.

Environment Details:

  • Operating System: Windows 11
  • CUDA Version: 12.2
  • WSL Version: 2
  • GPUs: 4x NVIDIA RTX 3090
  • PyTorch Version: 2.01 (CUDA 11.8)

Problem Statement: When I set the CUDA_VISIBLE_DEVICES environment variable to enable all the GPUs (0,1,2,3) on the system and then run a PyTorch script that calls torch.cuda.is_available(), I encounter an “Out of Memory” error. Notably, this error does not occur if I only enable GPU 1 or a combination of GPU 0,2,3. Furthermore, this error can be circumvented if I call torch.cuda.device_count() before torch.cuda.is_available().

Steps to Reproduce:

  1. Set the environment variable: export CUDA_VISIBLE_DEVICES=0,1,2,3
  2. Run a Python script that imports PyTorch and calls torch.cuda.is_available()

Expected Behavior: The torch.cuda.is_available() function should return True if GPUs are available and accessible.

Observed Behavior: An “Out of Memory” error is triggered internally at ../c10/cuda/CUDAFunctions.cpp:109. The torch.cuda.is_available() function returns False.

Workaround: I found that calling torch.cuda.device_count() before torch.cuda.is_available() circumvents the error. However, this workaround requires modifying each script to include this extra call.

While the workaround is effective, it may be beneficial to investigate and address the root cause of this issue. I wanted to bring this to your attention and look forward to any insights or potential solutions you might provide.

Thank you for your time and assistance.

Best Regards, Peter Deng, Language and Technology Research Group at TC, Columbia