Pytorch on Sagemaker has CUDA conflicts?

I need to use CUDA 11.8 with Pytorch and installed it via the command on the homepage
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

However, when I run

import torch
torch.version.cuda

All I get as output is 11.2. I have the following env variable set as well:

LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda-11.8/lib64/

I have no idea where it is pulling the 11.2 from. On running the following command, I noticed that the build seems to contain cuda112.

(base) sagemaker-user@default:~$ conda list -n base | grep torch
pytorch                   2.0.0           cuda112py310he33e0d6_200    conda-forge
pytorch-cuda              11.8                 h7e8668a_5    pytorch
pytorch-gpu               2.0.0           cuda112py310h9871d0b_200    conda-forge
pytorch-lightning         2.0.9              pyhd8ed1ab_0    conda-forge
pytorch-metric-learning   1.7.3              pyhd8ed1ab_0    conda-forge
pytorch-mutex             1.0                        cuda    pytorch
torchaudio                2.0.0               py310_cu118    pytorch
torchvision               0.15.2          cuda112py310h0801bf5_1    conda-forge
(base) sagemaker-user@default:~$ conda list -n base | grep cuda-toolkit
cuda-toolkit              11.8.0                        0    nvidia

(base) sagemaker-user@default:~$ echo $CUDA_HOME

(base) sagemaker-user@default:~$ echo $CUDA_PATH

(base) sagemaker-user@default:~$ 

I would appreciate if someone can help me address this? If you’ve experienced this before, could you please help me on why this might be happening? Thanks!

Edit:

Tried uninstalling via pip:

pip uninstall torch
pip uninstall torchvision
conda remove pytorch-gpu
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
conda list | grep torch

and this is what I get now:

sagemaker-user@default:~$ conda list | grep torch
pytorch                   2.0.0           cpu_mkl_py310h402c8e3_101    conda-forge                                                                       
pytorch-cuda              11.8                 h7e8668a_5    pytorch                                                                                     
pytorch-lightning         2.0.9              pyhd8ed1ab_0    conda-forge                                                                                 
pytorch-metric-learning   1.7.3              pyhd8ed1ab_0    conda-forge                                                                                 
pytorch-mutex             1.0                        cuda    pytorch                                                                                     
torchaudio                2.0.0               py310_cu118    pytorch                                                                                     
torchmetrics              1.0.3              pyhd8ed1ab_0    conda-forge                                                                                 
torchvision               0.15.2          cpu_py310hb9e6163_1    conda-forge  
import torch
print(torch.cuda.is_available()) # False
torch.version.cuda # Prints nothing

You are installing conda-forge PyTorch binary, that are not produced by us (or Sagemaker is installing these). The currently stable and nightly binaries we are building are shipping with CUDA 11.8 and 12.1 support so you could consider installing these instead.

Hi ptrblck,

I did that. In the final sections of my post I’ve done that but now it is installing Pytorch with CPU support despite using the 11.8 binary.

The last step still shows a CPU-version installed from conda-forge and the install command from here works fine:

pip install torch --index-url https://download.pytorch.org/whl/cu118
Looking in indexes: https://download.pytorch.org/whl/cu118
Collecting torch
  Downloading https://download.pytorch.org/whl/cu118/torch-2.2.1%2Bcu118-cp310-cp310-linux_x86_64.whl (819.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 819.2/819.2 MB 40.3 MB/s eta 0:00:00
Collecting filelock (from torch)
  Downloading https://download.pytorch.org/whl/filelock-3.9.0-py3-none-any.whl (9.7 kB)
Collecting typing-extensions>=4.8.0 (from torch)
  Downloading https://download.pytorch.org/whl/typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Collecting sympy (from torch)
  Downloading https://download.pytorch.org/whl/sympy-1.12-py3-none-any.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 33.8 MB/s eta 0:00:00
Collecting networkx (from torch)
  Downloading https://download.pytorch.org/whl/networkx-3.2.1-py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 34.7 MB/s eta 0:00:00
Collecting jinja2 (from torch)
  Downloading https://download.pytorch.org/whl/Jinja2-3.1.2-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 330.9 MB/s eta 0:00:00
Collecting fsspec (from torch)
  Downloading https://download.pytorch.org/whl/fsspec-2023.4.0-py3-none-any.whl (153 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.0/154.0 kB 48.6 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu11==11.8.89 (from torch)
  Downloading https://download.pytorch.org/whl/cu118/nvidia_cuda_nvrtc_cu11-11.8.89-py3-none-manylinux1_x86_64.whl (23.2 MB)
...