Issue with pip installation of PyTorch

Lav · July 6, 2022, 11:37am

A few months ago, I installed chemprop via Visual Studio Code, Windows 10 64 bit).
The installation instructions say:
“on machines with GPUs, you may need to manually install a GPU-enabled version of PyTorch by following the instructions here”, where here links to the PyTorch Start Locally page.
I have a Nvidia GeForce RTX 3050 Ti laptop GPU.
At the time, the PyTorch pip installation code was:

pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio===0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

which I used, and it worked. The basic tests in python said that cuda was available, and:

torch.__version__
'1.11.0+cu113'

torch.cuda.get_arch_list()
['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']

Chemprop also correctly called the GPU/cuda when necessary.

It is actually weird that this worked, because this post seems to say that my python output points to a 10.2 ‘runtime’, whatever that means (no idea, sorry, not an IT expert).
Plus, my current GPU driver is a bit old, and nvidia-smi says the max supported cuda is 11.2.
But OK, I only know that it works, somehow.

Now, a new version of chemprop came out, so I installed it (in a new conda environment), went back to PyTorch Start Locally find the installation code, and now it looks like this:

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Thinking that it was better to use the up-to-date pip installation code, I went ahead and ran it.
And the GPU was never again seen or used by torch or by chemprop (in the new environment; in the old one it still worked, luckily).

torch.__version__
'1.11.0+cu113'

but:

torch.cuda.is_available()
False

I looked at a lot of posts, some arguing that one should update the GPU driver, some describing incredibly complicated developer installations of separate packages, etc.:

[sorry, I am manually breaking the links otherwise your system says I cannot post as a new user]

ttps://stackoverflow.com/questions/69204737/torch-cuda-is-available-returns-false-why

ttps://stackoverflow.com/questions/60987997/why-torch-cuda-is-available-returns-false-even-after-installing-pytorch-with/61034368#61034368

ttps://stackoverflow.com/questions/70831932/cant-connect-to-gpu-when-building-pytorch-projects

Having no expertise whatsoever and not wanting to risk breaking my system, I simply tried removing the new conda environment, repeating the installation of chemprop, and then running the above old pip installation code for PyTorch.
This seemed to work. Now the tests are giving again the expected answers.

Does anybody know what may be going on?
Why is the new pip installation code not working, whereas the old one does, despite the apparent runtime conflict?

I have sort of resolved my issue, but I thought it may be useful to post this, in case my solution is not good/stable, and/or there is a better way to handle this.

E.g. in the above PyTorch discussion I linked, the user ends up saying “You were right. The env I was using had an incorrect version. Thanks go it working.”.
That’s very nice, but from my point of view as a non-expert, I have absolutely no idea how I can decide what ‘version’ my ‘env’ is using: I just run the code that is provided on the website and sort of hope it does the job… so maybe if someone could please explain what needs to be done, it would be great.

Thanks!

ptrblck · July 7, 2022, 12:07am

No, your output does not point to a CUDA10.2 runtime, as the sm_80 and sm_86 architectures are available and no errors are raised about the lack of Ampere support in the build.

nvidia-smi points to the CUDA Toolkit version which was released with the driver, which is not the max. version as CUDA11.x is compatible between minor releases (or at least it should be as long as the libraries are sticking to the support).

No idea and I haven’t seen the issue before. Could you check the links where the binaries are downloaded from as I would expect these are the same (just accessed via different install commands)?

Lav · July 7, 2022, 6:27pm

Thank you @ptrblck , I had clearly not understood this ‘runtime’ thing, as I suspected.
Note however that the torch.cuda.get_arch_list() output I reported is the one where I used the old installation command, i.e. the one where the system works. When I had used the new command, the output was different. And I had also tried running a python torch command (can’t remember what) that was supposed to produce a more informative output, and indeed it said something along the lines that the GPU was not being used due to a wrong configuration.
I could make a new env and repeat the process with the new command, to get the exact output and share it, if that can help disentangle this story.

For the rest, the sentence “check the links where the binaries are” unfortunately does not mean much to me - as I mentioned, I really am no expert in anything related to IT.
I only know two things: 1) I needed to install PyTorch because the statistical modelling software chemprop requires it to use the machine’s Nvidia GPU; 2) the most recent PyTorch installation command, as provided by the PyTorch website, failed to make my GPU work correctly, whereas the old command succeeded.

This might be specific to my own GPU though, because I also installed chemprop + PyTorch in a different machine with a more powerful and up-to-date GPU, and I did not seem to have any problems there, with the new PyTorch command.

Thanks

ptrblck · July 7, 2022, 11:39pm

Yes, this would be great. Don’t worry about not understanding the details and sorry for not being clear enough.
In the end, the install command should work on your setup, so could you create a new environment, use the new (broken) command, and post the full install logs here, so that I could take a look at it, please?

Lav · July 8, 2022, 6:38pm

OK, done.
I installed chemprop again using Option 1 here, into conda environment ‘chemprop151_test’.
I cloned this environment by:

conda create --name chemprop151_test_copy --clone chemprop151_test

Then I installed PyTorch with cuda 11.3 in chemprop151_test, i.e. activated the environtment in conda and did:

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

Output:

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
Requirement already satisfied: torch in c:\users\username\miniconda3\envs\chemprop151_test\lib\site-packages (1.12.0)
Collecting torchvision
  Using cached https://download.pytorch.org/whl/cu113/torchvision-0.13.0%2Bcu113-cp38-cp38-win_amd64.whl (4.7 MB)
Collecting torchaudio
  Using cached https://download.pytorch.org/whl/cu113/torchaudio-0.12.0%2Bcu113-cp38-cp38-win_amd64.whl (1.2 MB)
Requirement already satisfied: typing-extensions in c:\users\username\miniconda3\envs\chemprop151_test\lib\site-packages (from torch) (4.3.0)
Requirement already satisfied: requests in c:\users\username\miniconda3\envs\chemprop151_test\lib\site-packages (from torchvision) (2.28.1)
a3\envs\chemprop151_test\lib\site-packages (from torchvision) (9.2.0)
Requirement already satisfied: idna<4,>=2.5 in c:\users\username\miniconda3\envs\chemprop151_test\lib\site-packages (from requests->torchvision) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\username\miniconda3\envs\chemprop151_test\lib\site-packages (from requests->torchvision) (1.26.10)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\username\miniconda3\envs\chemprop151_test\lib\site-packages (from requests->torchvision) (2022.6.15)
Requirement already satisfied: charset-normalizer<3,>=2 in c:\users\username\miniconda3\envs\chemprop151_test\lib\site-packages (from requests->torchvision) (2.1.0)
Installing collected packages: torchvision, torchaudio
Successfully installed torchaudio-0.12.0+cu113 torchvision-0.13.0+cu113

python
Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>> torch.__version__
'1.12.0+cpu'
>>> torch.cuda.get_arch_list()
[]

Then I installed PyTorch with cuda 11.6 in chemprop151_test_copy, i.e. activated the environtment in conda and did:

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Output:

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu116
Requirement already satisfied: torch in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (1.12.0)
Collecting torchvision
  Downloading https://download.pytorch.org/whl/cu116/torchvision-0.13.0%2Bcu116-cp38-cp38-win_amd64.whl (2.6 MB)
     |████████████████████████████████| 2.6 MB 1.6 MB/s
Collecting torchaudio
  Downloading https://download.pytorch.org/whl/cu116/torchaudio-0.12.0%2Bcu116-cp38-cp38-win_amd64.whl (1.2 MB)
     |████████████████████████████████| 1.2 MB 6.4 MB/s
Requirement already satisfied: typing-extensions in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (from torch) (4.3.0)
Requirement already satisfied: numpy in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (from torchvision) (1.23.0)
Requirement already satisfied: requests in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (from torchvision) (2.28.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (from torchvision) (9.2.0)
Requirement already satisfied: charset-normalizer<3,>=2 in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (from requests->torchvision) (2.1.0)
Requirement already satisfied: idna<4,>=2.5 in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (from requests->torchvision) (3.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (from requests->torchvision) (1.26.10)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (from requests->torchvision) (2022.6.15)
Installing collected packages: torchvision, torchaudio
Successfully installed torchaudio-0.12.0+cu116 torchvision-0.13.0+cu116

python
Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
False
>>> torch.__version__
'1.12.0+cpu'
>>> torch.cuda.get_arch_list()
[]

When I installed PyTorch via the old code:

pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio===0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

Output:

Looking in links: https://download.pytorch.org/whl/cu113/torch_stable.html
Collecting torch==1.11.0+cu113
  Using cached https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp38-cp38-win_amd64.whl (2186.1 MB)
Collecting torchvision==0.12.0+cu113
  Using cached https://download.pytorch.org/whl/cu113/torchvision-0.12.0%2Bcu113-cp38-cp38-win_amd64.whl (5.4 MB)
Collecting torchaudio===0.11.0+cu113
  Using cached https://download.pytorch.org/whl/cu113/torchaudio-0.11.0%2Bcu113-cp38-cp38-win_amd64.whl (573 kB)
Requirement already satisfied: typing-extensions in c:\users\username\miniconda3\envs\chemprop151_oldpytorch\lib\site-packages (from torch==1.11.0+cu113) (4.3.0)
Requirement already satisfied: numpy in c:\users\username\miniconda3\envs\chemprop151_oldpytorch\lib\site-packages (from torchvision==0.12.0+cu113) (1.23.0)
Requirement already satisfied: requests in c:\users\username\miniconda3\envs\chemprop151_oldpytorch\lib\site-packages (from torchvision==0.12.0+cu113) (2.28.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\users\username\miniconda3\envs\chemprop151_oldpytorch\lib\site-packages (from torchvision==0.12.0+cu113) (9.2.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\username\miniconda3\envs\chemprop151_oldpytorch\lib\site-packages (from requests->torchvision==0.12.0+cu113) (1.26.10)
Requirement already satisfied: charset-normalizer<3,>=2 in c:\users\username\miniconda3\envs\chemprop151_oldpytorch\lib\site-packages (from requests->torchvision==0.12.0+cu113) (2.1.0)
Requirement already satisfied: idna<4,>=2.5 in c:\users\username\miniconda3\envs\chemprop151_oldpytorch\lib\site-packages (from requests->torchvision==0.12.0+cu113) (3.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\username\miniconda3\envs\chemprop151_oldpytorch\lib\site-packages (from requests->torchvision==0.12.0+cu113) (2022.6.15)
Installing collected packages: torch, torchvision, torchaudio
  Attempting uninstall: torch
    Found existing installation: torch 1.12.0
    Uninstalling torch-1.12.0:
      Successfully uninstalled torch-1.12.0
Successfully installed torch-1.11.0+cu113 torchaudio-0.11.0+cu113 torchvision-0.12.0+cu113

python
Python 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.__version__
'1.11.0+cu113'
>>> torch.cuda.get_arch_list()
['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'compute_37']

I hope there is something in this that explains why the old installation works and the new one doesn’t.

Thanks again for your time!

ptrblck · July 8, 2022, 10:40pm

Thanks for the logs.

Let’s go through some points to isolate the issue.

The main difference between the “breaking” and “working” install commands is the version specification for each package:

# broken 
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

# working
pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio===0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

as well as the CUDA runtime for cu116.

Now let’s check what the “broken” commands are doing:

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
Requirement already satisfied: torch in c:\users\username\miniconda3\envs\chemprop151_test\lib\site-packages (1.12.0)
...
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu116
Requirement already satisfied: torch in c:\users\username\miniconda3\envs\chemprop151_test_copy\lib\site-packages (1.12.0)

So it looks as if the torch requirement is already satisfied and PyTorch was already installed.
Checking the environment.yml also shows this requirement:

pytorch>=1.4.0

Your quick torch.__version__ check also shows that PyTorch is indeed installed from the latest stable release, but the CPU-only version:

>>> torch.__version__
'1.12.0+cpu'

pip install might not be smart enough to figure out that you want to install the PyTorch wheels with the CUDA runtime, checks for an already installed torch package, finds it, and skips the install command for torch.

However, if you specify the version directly via:

pip3 install torch==1.11.0+cu113 ...

the install command will see a version mismatch (note that the requirements from chemprop installed torch==1.12.0+cpu) and will download it:

Looking in links: https://download.pytorch.org/whl/cu113/torch_stable.html
Collecting torch==1.11.0+cu113
  Using cached https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp38-cp38-win_amd64.whl (2186.1 MB)

Note also that the previously CPU-only version will be uninstalled in this process:

Installing collected packages: torch, torchvision, torchaudio
  Attempting uninstall: torch
    Found existing installation: torch 1.12.0
    Uninstalling torch-1.12.0:
      Successfully uninstalled torch-1.12.0
Successfully installed torch-1.11.0+cu113 ...

which explains why the version tag works for you.

In summary: the previous command should also work, if you would uninstall the CPU-only version manually before running the install command.
I hope it helps.

Lav · July 9, 2022, 7:13am

Thanks, yes, it makes perfect sense. So the problem is that chemprop’s installation puts there a CPU version of Torch, which is not replaced by the new PyTorch installation command.
I would argue that the issue comes a bit from both sides then, because chemprop is correct in installing Torch-CPU, so the software can work even for users who don’t have GPU’s: but the previous PyTorch installation command made sure to install the correct version of Torch, regardless of what was there before. Perhaps this should always be the case, or am I wrong? Won’t a user who installs PyTorch by this command generally want the correct/best/up to date version of Torch to be installed?
Anyway, I wonder why the new command instead worked perfectly OK in the other machine, because the procedure I followed was 100% the same.

We can call it problem solved, assuming that I find the command to uninstall Torch-CPU from chemprop. I will have a look at the manual.

Thanks!

ptrblck · July 10, 2022, 12:51am

Yes, I think you are right and pip already thinks you are on the latest and greatest PyTorch version (1.12.0 was already found). However, I also think that the version is only checked and pip is not aware of the CUDA runtimes unfortunately and thus doesn’t detect that you want to use 1.12.0+cu113 instead.

If the same exact workflow produced different result on another machine, I would guess that pip might have been able to detect the version conflict, but you should check the install logs again (and post them here) so that I could take a look at it.

Lav · July 28, 2022, 4:16pm

Yeah actually I was wrong: the procedure I used in the linux machine was not 100% the same.
I stopped after installing chemprop, because the torch.cuda.is_available() already said True, and:

torch.__version__
'1.12.0+cu102'
torch.cuda.get_arch_list()
['sm_37', 'sm_50', 'sm_60', 'sm_60']

Maybe this means I am not using the latest or best version, but as long as chemprop works…