CPU version being installed and used despite explicit install of pytorch-cuda=11.7

Note the only mention of pytorch explicitly requests cuda in my environment.yaml for creation of the conda “classification” environment, yet the cpu version of pytorch is installed and used:

$ cat environment.yaml 
name: classification
channels:
  - defaults
  - pytorch
  - nvidia
  - conda-forge
dependencies:
  - matplotlib
  - pillow
  - transformers
  - pytorch-cuda=11.7

However, the cpu version is being installed and then used as a result of import torch:

(classification) :~/dev/classification$ nvidia-smi
Fri Feb 24 10:55:07 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:09:00.0  On |                  N/A |
| 30%   37C    P3    47W / 290W |    732MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3675      G   /usr/lib/xorg/Xorg                460MiB |
|    0   N/A  N/A      3847      G   /usr/bin/gnome-shell              100MiB |
|    0   N/A  N/A      6740      G   ...6_64.v03.00.0074.AppImage       12MiB |
|    0   N/A  N/A     14603      G   ...489277852483348708,131072      156MiB |
+-----------------------------------------------------------------------------+
(classification) :~/dev/classification$ mamba list |grep torch
pytorch                   1.12.1          cpu_py310h9dbd814_1  
pytorch-cuda              11.7                 h67b0de4_1    pytorch
(classification) :~/dev/classification$ python -c 'import torch;print(torch.backends.mps.is_available(), torch.backends.mps.is_built())'
False False
(classification) :~/dev/classification$ python -c 'import torch;print(torch.cuda.is_available())'
False
1 Like

After spending the better part of yesterday and today wrestling with this, I was able to get pytorch to support cuda by uninstalling all prior versions of cuda (nothing under /usr/local/cuda*) , retreating to NVIDIA’s driver version 515 for Ubuntu, deleting the conda environment and creating it with this command:

conda create -n classification cuda-toolkit==11.7 pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

I had to specify which cuda-toolkit version to use because otherwise it installed almost everything for cuda 12 rather than 11.7. Even after specifying 11.7 it still installed a lot of the cuda-* stuff for version 12.

Here’s how the environment responds now:

(classification) :~/dev/classification$ python -c 'import torch;print(torch.cuda.is_available())'
True
(classification) :~/dev/classification$ python -c 'import torch;print(torch.backends.mps.is_available(), torch.backends.mps.is_built())'
False False
2 Likes