You might want to check this post.
Thanks⌠I think you said it should be ok if we are using single GPU. In my case i am using single GPU. Hence, it should work.
hiďź, I encountered the same issue with Windows not supporting NCCL. I only want to use a single GPU, but I donât know how to resolve it. Here is the relevant information. Can you provide me with a solution?
Collecting environment information...
PyTorch version: 1.13.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 ä¸ä¸ç
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.3.103
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 551.76
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin\cudnn_ops_train64_8.dll
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==1.13.1+cu117
[pip3] torchaudio==0.13.1+cu117
[pip3] torchvision==0.14.1+cu117
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.8.0 hd77b12b_0
[conda] mkl 2023.1.0 h6b88ed4_46358
[conda] mkl-service 2.4.0 py39h2bbff1b_1
[conda] mkl_fft 1.3.8 py39h2bbff1b_0
[conda] mkl_random 1.2.4 py39h59b6b97_0
[conda] numpy 1.26.4 py39h055cbcc_0
[conda] numpy-base 1.26.4 py39h65a83cf_0
[conda] pytorch-mutex 1.0 cpu pytorch
[conda] torch 1.13.1+cu117 pypi_0 pypi
[conda] torchaudio 0.13.1+cu117 pypi_0 pypi
[conda] torchvision 0.14.1+cu117 pypi_0 pypi
torch.cuda.nccl.is_available(torch.randn(1).cuda())
D:\anaconda\envs\McQuic_1\lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn('PyTorch is not compiled with NCCL support')
False
I have the same issue. This is my output.
Collecting environment informationâŚ
PyTorch version: 2.2.2+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/AOS: Microsoft Windows Server 2019 Standard
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/APython version: 3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:24:38) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.17763-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 3090Nvidia driver version: 472.12
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: TrueCPU:
Architecture=9
CurrentClockSpeed=2101
DeviceID=CPU0
Family=179
L2CacheSize=16384
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2101
Name=Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
ProcessorType=3
Revision=21767Architecture=9
CurrentClockSpeed=2101
DeviceID=CPU1
Family=179
L2CacheSize=16384
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2101
Name=Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
ProcessorType=3
Revision=21767Versions of relevant libraries:
[pip3] flake8==6.1.0
[pip3] numpy==1.26.3
[pip3] torch==2.2.2+cu118
[pip3] torchaudio==2.2.2+cu118
[pip3] torchvision==0.17.2+cu118
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6b88ed4_46358
[conda] mkl-service 2.4.0 py310h2bbff1b_1
[conda] mkl_fft 1.3.8 py310h2bbff1b_0
[conda] mkl_random 1.2.4 py310h59b6b97_0
[conda] numpy 1.26.3 py310h055cbcc_0
[conda] numpy-base 1.26.3 py310h65a83cf_0
[conda] torch 2.2.2+cu118 pypi_0 pypi
[conda] torchaudio 2.2.2+cu118 pypi_0 pypi
[conda] torchvision 0.17.2+cu118 pypi_0 pypi
As far as I can tell, this is an Windows issue, isnt it? The line:
torch.cuda.nccl.is_available(torch.randn(1).cuda())
also returns False.
I have the same issue.
python -m torch.utils.collect_env
output : Collecting environment informationâŚ
PyTorch version: 1.8.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows Server 2022 Datacenter
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Python version: 3.9 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla T4
GPU 1: Tesla T4
GPU 2: Tesla T4
GPU 3: Tesla T4
Nvidia driver version: 551.78
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.23.0
[pip3] torch==1.8.0+cu111
[pip3] torchaudio==2.2.0.dev20240426+cu121
[pip3] torchmetrics==0.8.0
[pip3] torchvision==0.19.0.dev20240426+cu121
[conda] _anaconda_depends 2024.02 py311_mkl_1
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 pypi_0 pypi
[conda] mkl-service 2.4.0 py311h2bbff1b_1
[conda] mkl_fft 1.3.8 py311h2bbff1b_0
[conda] mkl_random 1.2.4 py311h59b6b97_0
[conda] numpy 1.26.4 py311hdab7c0b_0
[conda] numpy-base 1.26.4 py311hd01c5d8_0
[conda] numpydoc 1.5.0 py311haa95532_0
[conda] pytorch 2.2.2 py3.11_cuda11.8_cudnn8_0 pytorch
[conda] pytorch-cuda 11.8 h24eeafa_5 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 2.4.0.dev20240421+cu121 pypi_0 pypi
[conda] torchaudio 2.2.0.dev20240421+cu121 pypi_0 pypi
[conda] torchvision 0.17.2 pypi_0 pypi
import torch.cuda.nccl
torch.cuda.nccl.is_available(torch.randn(1).cuda())
output : False
You are running into the same issue as above.
Hi dear
I have same issue to run this this code . https://github.com/12wang3/rrl
Hereâs my environment information :
<frozen runpy>:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Home Single Language
GCC version: (Rev6, Built by MSYS2 project) 13.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22621-SP0
Is CUDA available: True
CUDA runtime version: 12.4.99
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU
Nvidia driver version: 555.99
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=2400
DeviceID=CPU0
Family=198
L2CacheSize=4096
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2400
Name=13th Gen Intel(R) Core(TM) i7-13700H
ProcessorType=3
Revision=
Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.8.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.5.0
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1+cu121
[pip3] torchvision==0.18.1
[conda] Could not collect
And I just using single GPU but raise a same error.
raise RuntimeError("Distributed package doesn't have NCCL built in")
RuntimeError: Distributed package doesn't have NCCL built in
Also when I run the code below :
import torch
torch.cuda.nccl.is_available(torch.randn(1).cuda())
The output is :
G:\Program Files\anaconda3\Lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn("PyTorch is not compiled with NCCL support")
False
Thanks for your help
Same as above: use Gloo on Windows machines or disable the distributed usage.
When I use Gloo for backend in train_model function in experiment.py file :
def train_model(gpu, args):
rank = args.nr * args.gpus + gpu
dist.init_process_group(backend='gloo', init_method='env://', world_size=args.world_size, rank=rank)
and got new error :
(base) PS C:\Users\ASUS\Downloads\rrl-main\rrl-main> python experiment.py -d tic-tac-toe -bs 32 -s 1@16 -e401 -lrde 200 -lr 0.002 -ki 0 -i 0 -wd 0.0001 --print_rule
[W socket.cpp:697] [c10d] The client socket has failed to connect to [Romina]:58753 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\experiment.py", line 174, in <module>
train_main(rrl_args)
File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\experiment.py", line 167, in train_main
mp.spawn(train_model, nprocs=args.gpus, args=(args,))
File "G:\Program Files\anaconda3\Lib\site-packages\torch\multiprocessing\spawn.py", line 281, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "G:\Program Files\anaconda3\Lib\site-packages\torch\multiprocessing\spawn.py", line 237, in start_processes
while not context.join():
^^^^^^^^^^^^^^
File "G:\Program Files\anaconda3\Lib\site-packages\torch\multiprocessing\spawn.py", line 188, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "G:\Program Files\anaconda3\Lib\site-packages\torch\multiprocessing\spawn.py", line 75, in _wrap
fn(i, *args)
File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\experiment.py", line 70, in train_model
db_enc, train_loader, valid_loader, _ = get_data_loader(dataset, args.world_size, rank, args.batch_size,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\experiment.py", line 26, in get_data_loader
db_enc.fit(X_df, y_df)
File "C:\Users\ASUS\Downloads\rrl-main\rrl-main\rrl\utils.py", line 59, in fit
self.y_fname = list(self.label_enc.get_feature_names(y_df.columns)) if self.y_one_hot else y_df.columns
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'OneHotEncoder' object has no attribute 'get_feature_names'
and I donât know how to disable the distributed usage.
I just renamed âget_feature_namesâ to 'âget_feature_names_outâ and solve my issue !
Thanks
I am getting error
raise RuntimeError("Distributed package doesnât have NCCL " âbuilt inâ)
RuntimeError: Distributed package doesnât have NCCL built in
i am in windows 11
12 gb nvidia rtx 3060 gpu
how i can resolve itâŚ
CUDA available: True
CUDA version in PyTorch: 11.3
PyTorch version: 1.12.1+cu113
PyTorch version: 1.12.1+cu113
GPU is available
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-H0Q7GBK]:57324 (system error: 10049 - The requested address is not valid in its context.).
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-H0Q7GBK]:57324 (system error: 10049 - The requested address is not valid in its context.).
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-H0Q7GBK]:57324 (system error: 10049 - The requ[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [DESKTOP-H0Q7GBK]:57324 (system error: 10049 - The requested address is not valid in its context.).
[2024-09-06 16:22:21,118][main][CRITICAL] - Training failed due to Distributed package doesnât have NCCL built in:
ested address is not valid in its context.).
[2024-09-06 16:22:21,118][main][CRITICAL] - Training failed due to Distributed package doesnât have NCCL built in:
Traceback (most recent call last):
File âbin/train.pyâ, line 83, in main
Traceback (most recent call last):
File âbin/train.pyâ, line 83, in main
trainer.fit(training_model)
trainer.fit(training_model)
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\trainer\trainer.pyâ, line 496, in fit
self.pre_dispatch()
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\trainer\trainer.pyâ, line 496, in fit
self.pre_dispatch()
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\trainer\trainer.pyâ, line 525, in pre_dispatch
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\trainer\trainer.pyâ, line 525, in pre_dispatch
self.accelerator.pre_dispatch()
self.accelerator.pre_dispatch()
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\accelerators\accelerator.pyâ, line 83, in pre_dispat File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\accelerators\accelerator.pyâ, line 83, in pre_dispatch
self.training_type_plugin.pre_dispatch()
self.training_type_plugin.pre_dispatch()
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\plugins\training_type\ddp.pyâ, line 258, in pre_dispatch
self.init_ddp_connection(self.global_rank, self.world_size)
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\plugins\training_type\ddp.pyâ, line 258, in pre_dispatch
self.init_ddp_connection(self.global_rank, self.world_size)
self.init_ddp_connection(self.global_rank, self.world_size)
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\plugins\training_type\ddp.pyâ, line 241, in init_ddp File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pytorch_lightning\plugins\training_type\ddp.pyâ, line 241, in init_ddp_connection
torch_distrib.init_process_group(self.torch_distributed_backend, rank=global_rank, world_size=world_size)
_connection
torch_distrib.init_process_group(self.torch_distributed_backend, rank=global_rank, world_size=world_size)
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\distributed\distributed_c10d.pyâ, line 602, in init_process_group
default_pg = _new_process_group_helper(
File âC:\Users\spx016\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\torch\distributed\distributed_c10d.pyâ, line 727, in _new_process_group_helper
raise RuntimeError("Distributed package doesnât have NCCL " âbuilt inâ)
RuntimeError: Distributed package doesnât have NCCL built in
code isâŚ
trainer = Trainer(
# there is no need to suppress checkpointing in ddp, because it handles rank on its own
callbacks=ModelCheckpoint(dirpath=checkpoints_dir, **config.trainer.checkpoint_kwargs),
logger=metrics_logger,
default_root_dir=os.getcwd(),
**trainer_kwargs
)
trainer.fit(training_model)
Same as above: use Gloo on Windows machines or disable the distributed usage.