YOLOv5 2022-3-25 torch 1.8.1+cu111 CUDA:0 (NVIDIA GeForce RTX 3060 Laptop GPU, 6144MiB) -> Invalid CUDA '--device 0'

luka123 · March 26, 2022, 10:03am

I’m trying to train yolov5s with bdd100k on my GPU locally and getting this error although my GPU check is correctly done. I installed latest version of cuda and pytorch.

ERROR: github: up to date with GitHub - ultralytics/yolov5: YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
fatal: cannot change to ‘C:\Users\Luka\Desktop\Berkeley’: No such file or directory
Traceback (most recent call last):
File “C:\Users\Luka\Desktop\Berkeley dataset\yolov5s_bdd100k\yolov5\train.py”, line 643, in
main(opt)
File “C:\Users\Luka\Desktop\Berkeley dataset\yolov5s_bdd100k\yolov5\train.py”, line 525, in main
device = select_device(opt.device, batch_size=opt.batch_size)
File “C:\Users\Luka\Desktop\Berkeley dataset\yolov5s_bdd100k\yolov5\utils\torch_utils.py”, line 61, in select_device
assert torch.cuda.is_available() and torch.cuda.device_count() >= len(device.replace(‘,’, ‘’)),
AssertionError: Invalid CUDA ‘–device 0’ requested, use ‘–device cpu’ or pass valid CUDA device(s)

AND after that I do nvidia-smi

ptrblck · March 26, 2022, 9:08pm

It seems your PyTorch installation cannot find the GPU.
What is torch.version.cuda returning and could you post the output of python -m torch.utils.collect_env, please?

luka123 · March 27, 2022, 5:01pm

Torch version = 11.1

…collect_env output:
Collecting environment information…
PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Education
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.22.3

Python version: 3.9 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 512.15
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.22.3

[pip3] torch==1.11.0

[pip3] torchaudio==0.11.0+cu113

[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl

[conda] cpuonly 1.0 0 pytorch

[conda] cudatoolkit 11.3.1 h59b6b97_2

[conda] mkl 2021.4.0 haa95532_640

[conda] mkl-include 2022.0.0 haa95532_115

[conda] mkl-service 2.4.0 py39h2bbff1b_0

[conda] mkl_fft 1.3.1 py39h277e83a_0

[conda] mkl_random 1.2.2 py39hf11a4ad_0

[conda] numpy 1.22.3 pypi_0 pypi

[conda] pytorch 1.8.0 py3.9_cpu_0 [cpuonly] pytorch

[conda] pytorch-mutex 1.0 cuda pytorch

[conda] torch 1.11.0 pypi_0 pypi

[conda] torchaudio 0.11.0+cu113 pypi_0 pypi

[conda] torchvision 0.12.0 pypi_0 pypi

ptrblck · March 27, 2022, 9:39pm

It seems your environments might be a bit mixed up as you claim to use the CUDA 11.1 runtime, while the setup output shows no CUDA at all:

CUDA used to build PyTorch: Could not collect
...
Is CUDA available: False

And also shows different installed packages:

PyTorch version: 1.8.0

[pip3] torch==1.11.0

[pip3] torchaudio==0.11.0+cu113

[pip3] torchvision==0.12.0

[conda] cpuonly 1.0 0 pytorch

[conda] pytorch 1.8.0 py3.9_cpu_0 [cpuonly] pytorch

[conda] torch 1.11.0 pypi_0 pypi

[conda] torchaudio 0.11.0+cu113 pypi_0 pypi

[conda] torchvision 0.12.0 pypi_0 pypi

Based on this a few cpuonly packages are installed, other libraries with a CUDA 11.3 runtime etc.
I would recommend to create a new and empty virtual environment and install the latest stable release there to avoid these issues.

luka123 · March 29, 2022, 10:41am

Now have this error:

OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading “C:\Users\Luka\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\lib\cublas64_11.dll” or one of its dependencies.

and millions of this error:
File “C:\Users\Luka\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\multiprocessing\reductions.py”, line 36, in del
File “C:\Users\Luka\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\storage.py”, line 520, in _free_weak_ref
AttributeError: ‘NoneType’ object has no attribute ‘_free_weak_ref’
Exception ignored in: <function StorageWeakRef.del at 0x00000286F54F7C10>

There is python -m torch.utils.collect_env output.
$ python -m torch.utils.collect_env
Collecting environment information…
PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Education
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.9.0 (tags/v3.9.0:9cf6752, Oct 5 2020, 15:34:40) [MSC v.1927 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19041-SP0
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
Nvidia driver version: 512.15
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.11.0+cu113
[pip3] torchaudio==0.11.0+cu113
[pip3] torchvision==0.12.0+cu113
[conda] Could not collect

luka123 · March 29, 2022, 4:29pm

Problem is SOLVED.
I make new venv with conda and install pytorch firstly, then yolov5 requirements with pip inside env directory. Now packages look like this:
Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.11.0
[pip3] torchaudio==0.11.0
[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h59b6b97_2
[conda] mkl 2021.4.0 haa95532_640
[conda] mkl-service 2.4.0 py39h2bbff1b_0
[conda] mkl_fft 1.3.1 py39h277e83a_0
[conda] mkl_random 1.2.2 py39hf11a4ad_0
[conda] numpy 1.21.5 py39ha4e8547_0
[conda] numpy-base 1.21.5 py39hc2deb75_0
[conda] pytorch 1.11.0 py3.9_cuda11.3_cudnn8_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.11.0 py39_cu113 pytorch
[conda] torchvision 0.12.0 py39_cu113 pytorch

Lastly, I set ‘–workers 4’ and my GPU can hold out batch size of 25 with this settings. Training time is 12 minutes per epoch. Thanks for help :D.

10756007 · July 28, 2022, 6:09am

I have a same problem, but the solution doesn’t work for me. Please help me.

10756007 · July 28, 2022, 10:50am

Could you help me,please.
I have the same problem.

luka123 · July 28, 2022, 11:20am

Hi, you should create virtual env and install cuda 11.3 from PyTorch official page. After installing cuda (torchvision, torch, etc…) clone YOLOv5 and install requirements with pip. Installing packages in this order doesn’t make problems for me.