Hi. I’m trying to run my training pipeline inside pytorch docker container by Nvidia NGC: PyTorch | NVIDIA NGC
So I run:
$ docker run -it --gpus all nvcr.io/nvidia/pytorch:23.09-py3
It already has pre-installed pytorch and it works fine:
$ python
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
I also need torchaudio for my purposes and it is not preinstalled. So I run:
$ pip install torchaudio
Then I get:
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting torchaudio
Downloading torchaudio-2.0.2-cp310-cp310-manylinux1_x86_64.whl (4.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 16.6 MB/s eta 0:00:00
Collecting torch==2.0.1 (from torchaudio)
Downloading torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 619.9/619.9 MB 25.3 MB/s eta 0:00:00
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.0.1->torchaudio) (3.12.4)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==2.0.1->torchaudio) (4.7.1)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch==2.0.1->torchaudio) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.0.1->torchaudio) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.0.1->torchaudio) (3.1.2)
Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from torch==2.0.1->torchaudio)
Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.0/21.0 MB 25.7 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu11==11.7.99 (from torch==2.0.1->torchaudio)
Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 849.3/849.3 kB 25.9 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu11==11.7.101 (from torch==2.0.1->torchaudio)
Downloading nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 24.1 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu11==8.5.0.96 (from torch==2.0.1->torchaudio)
Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 557.1/557.1 MB 24.5 MB/s eta 0:00:00
Collecting nvidia-cublas-cu11==11.10.3.66 (from torch==2.0.1->torchaudio)
Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 23.5 MB/s eta 0:00:00
Collecting nvidia-cufft-cu11==10.9.0.58 (from torch==2.0.1->torchaudio)
Downloading nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.4/168.4 MB 23.1 MB/s eta 0:00:00
Collecting nvidia-curand-cu11==10.2.10.91 (from torch==2.0.1->torchaudio)
Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.6/54.6 MB 27.0 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu11==11.4.0.1 (from torch==2.0.1->torchaudio)
Downloading nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.6/102.6 MB 32.8 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu11==11.7.4.91 (from torch==2.0.1->torchaudio)
Downloading nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 173.2/173.2 MB 21.3 MB/s eta 0:00:00
Collecting nvidia-nccl-cu11==2.14.3 (from torch==2.0.1->torchaudio)
Downloading nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.1/177.1 MB 34.0 MB/s eta 0:00:00
Collecting nvidia-nvtx-cu11==11.7.91 (from torch==2.0.1->torchaudio)
Downloading nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/98.6 kB 269.8 MB/s eta 0:00:00
Collecting triton==2.0.0 (from torch==2.0.1->torchaudio)
Downloading triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.3/63.3 MB 21.2 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch==2.0.1->torchaudio) (68.2.2)
Requirement already satisfied: wheel in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch==2.0.1->torchaudio) (0.41.2)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch==2.0.1->torchaudio) (3.27.4.1)
Collecting lit (from triton==2.0.0->torch==2.0.1->torchaudio)
Downloading lit-17.0.1.tar.gz (154 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 154.7/154.7 kB 59.7 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch==2.0.1->torchaudio) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch==2.0.1->torchaudio) (1.3.0)
Building wheels for collected packages: lit
Building wheel for lit (pyproject.toml) ... done
Created wheel for lit: filename=lit-17.0.1-py3-none-any.whl size=93271 sha256=a93faeb2fae041f3d7ac409cb3080196e55cca1dea6b86f6409545e0b7233269
Stored in directory: /tmp/pip-ephem-wheel-cache-8htmrdd0/wheels/cf/3a/a0/f65551951357f983270eb3b210b98c6be543f3ed5cf89deba4
Successfully built lit
Installing collected packages: lit, nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, nvidia-cusolver-cu11, nvidia-cudnn-cu11, triton, torch, torchaudio
Attempting uninstall: triton
Found existing installation: triton 2.1.0+e621604
Uninstalling triton-2.1.0+e621604:
Successfully uninstalled triton-2.1.0+e621604
Attempting uninstall: torch
Found existing installation: torch 2.1.0a0+32f93b1
Uninstalling torch-2.1.0a0+32f93b1:
Successfully uninstalled torch-2.1.0a0+32f93b1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchdata 0.7.0a0 requires torch==2.1.0a0+32f93b1, but you have torch 2.0.1 which is incompatible.
torchtext 0.16.0a0 requires torch==2.1.0a0+32f93b1, but you have torch 2.0.1 which is incompatible.
torchvision 0.16.0a0 requires torch==2.1.0a0+32f93b1, but you have torch 2.0.1 which is incompatible.
Successfully installed lit-17.0.1 nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 torch-2.0.1 torchaudio-2.0.2 triton-2.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
So for some reason it uninstalls existing pytorch torch==2.1.0a0+32f93b1, installs instead torch-2.0.1-cp310 and corresponding torchaudio. This leads to weird errors later on. How to install torchaudio properly in this case?