Pytorch train on cpu works but not on gpu

Hi,

I’ve installed torch using conda on Windows 10 (conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch)

python

import torch
torch.cuda.is_available()
True

torch.cuda.get_device_name(0)
‘NVIDIA GeForce GTX 1650’

import torch
print(torch.version)
1.10.0

import numpy
numpy.version.version
‘1.21.2’

When running train.py file for yolov5 as:
python train.py --img 512 --workers 1 --batch 2 --epochs 20 --data yolo_train.yaml --weights yolov5s.pt --cache

It shows the following message:
YOLOv5 2021-11-4 torch 1.10.0 CUDA:0 (NVIDIA GeForce GTX 1650, 4096MiB)
0/19 0.325G nan nan nan 2 512: 100%|████████████████████████████████| 152/152 [00:40<00:00, 3.78it/s]
C:\Users\m.conda\envs\yolov5_gpu3\lib\site-packages\torch\optim\lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|████████████████| 424/424 [00:52<00:00, 8.06it/s]
all 1694 0 0 0 0 0

 Epoch   gpu_mem       box       obj       cls    labels  img_size
  1/19    0.357G       nan       nan       nan         4       512: 100%|████████████████████████████████| 152/152 [00:35<00:00,  4.26it/s]
           Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|████████████████| 424/424 [00:52<00:00,  8.04it/s]
             all       1694          0          0          0          0          0

 Epoch   gpu_mem       box       obj       cls    labels  img_size
  2/19    0.357G       nan       nan       nan         2       512: 100%|████████████████████████████████| 152/152 [00:35<00:00,  4.28it/s]

Basically, it never learns. Box, obj, cls are always nan, and P and R are always 0.

On the other hand, when using the same code but running on cpu:

Logging results to runs\train\exp25
Starting training for 20 epochs…

 Epoch   gpu_mem       box       obj       cls    labels  img_size
  0/19        0G    0.1147   0.02892   0.05367         5       512:   2%|▋                                 | 3/152 [00:05<03:40,  1.48s/it]

The values are correct.

Do you have an idea how to solve that?

Thanks