Help installing 1.3

Chris_Palmer · October 16, 2019, 6:24pm

I am trying to install the latest Pytorch using the conda installation on my Windows 10 machine. I get the following error - can anyone help me understand what I should do to fix this? I have previously installed cuda 10.1 and it’s in the PATH, but I do have other versions of CUDA available in my NVIDIA GPU Computing Toolkit\CUDA folder.

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch
Collecting package metadata (current_repodata.json): done
Solving environment: failed
Collecting package metadata (repodata.json): done
Solving environment: failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

  - conda-forge/noarch::bleach==3.1.0=py_0
  - conda-forge/noarch::nbconvert==5.5.0=py_0 -> bleach
  - conda-forge/win-64::arrow-cpp==0.14.0=py36h1b0c03e_0 -> boost-cpp[version='>=1.70.0,<1.70.1.0a0'] -> libboost[version='<0']
  - conda-forge/win-64::astroid==2.2.5=py36_0 -> wrapt
  - conda-forge/win-64::boost-cpp==1.70.0=h6a4c333_0 -> libboost[version='<0']
  - conda-forge/win-64::bzip2==1.0.6=hfa6e2cd_1002
  - conda-forge/win-64::conda-package-handling==1.3.10=py36_0 -> libarchive[version='>=3.3.3'] -> bzip2[version='>=1.0.8,<2.0a0']
  - conda-forge/win-64::libarchive==3.3.3=h4890af2_1005 -> bzip2[version='>=1.0.6,<2.0a0']
  - conda-forge/win-64::libkml==1.3.0=h4ece8bf_1010 -> boost-cpp[version='>=1.70.0,<1.70.1.0a0'] -> libboost[version='<0']
  - conda-forge/win-64::libnetcdf==4.6.2=h396784b_1001 -> bzip2[version='>=1.0.6,<2.0a0']
  - conda-forge/win-64::pyarrow==0.14.0=py36h803c963_0 -> arrow-cpp[version='>=0.14.0,<0.15.0a0,>=0.14.0,<1.0a0'] -> boost-cpp[version='>=1.70.0,<1.70.1.0a0'] -> libboost[version='<0']
  - conda-forge/win-64::pyarrow==0.14.0=py36h803c963_0 -> arrow-cpp[version='>=0.14.0,<0.15.0a0,>=0.14.0,<1.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']
  - conda-forge/win-64::pylint==2.3.1=py36_0 -> astroid[version='>=2.2.0'] -> wrapt
  - conda-forge/win-64::python-libarchive-c==2.8=py36_1004 -> libarchive -> bzip2[version='>=1.0.8,<2.0a0']
  - conda-forge/win-64::spyder==3.3.5=py36_0 -> pylint -> astroid[version='>=2.3.0,<2.4'] -> wrapt
  - conda-forge/win-64::thrift-cpp==0.12.0=hd042d19_1004 -> boost-cpp[version='>=1.70.0,<1.70.1.0a0'] -> libboost[version='<0']
  - conda-forge/win-64::wrapt==1.11.2=py36hfa6e2cd_0
  - cudatoolkit=10.1
  - cudnn -> cudatoolkit=9.0
  - cupy -> cudatoolkit=9.0

I also tried with CUDA 9.2 compatibility - but the same sort of thing happens:

conda install pytorch torchvision cudatoolkit=9.2 -c pytorch -c defaults -c numba/label/dev
Collecting package metadata (current_repodata.json): done
Solving environment: failed
Collecting package metadata (repodata.json): done
Solving environment: failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

  - conda-forge/noarch::bleach==3.1.0=py_0
  - conda-forge/noarch::nbconvert==5.5.0=py_0 -> bleach
  - conda-forge/win-64::arrow-cpp==0.14.0=py36h1b0c03e_0 -> boost-cpp[version='>=1.70.0,<1.70.1.0a0'] -> libboost[version='<0']
  - conda-forge/win-64::astroid==2.2.5=py36_0 -> wrapt
  - conda-forge/win-64::boost-cpp==1.70.0=h6a4c333_0 -> libboost[version='<0']
  - conda-forge/win-64::bzip2==1.0.6=hfa6e2cd_1002
  - conda-forge/win-64::conda-package-handling==1.3.10=py36_0 -> libarchive[version='>=3.3.3'] -> bzip2[version='>=1.0.8,<2.0a0']
  - conda-forge/win-64::libarchive==3.3.3=h4890af2_1005 -> bzip2[version='>=1.0.6,<2.0a0']
  - conda-forge/win-64::libkml==1.3.0=h4ece8bf_1010 -> boost-cpp[version='>=1.70.0,<1.70.1.0a0'] -> libboost[version='<0']
  - conda-forge/win-64::libnetcdf==4.6.2=h396784b_1001 -> bzip2[version='>=1.0.6,<2.0a0']
  - conda-forge/win-64::pyarrow==0.14.0=py36h803c963_0 -> arrow-cpp[version='>=0.14.0,<0.15.0a0,>=0.14.0,<1.0a0'] -> boost-cpp[version='>=1.70.0,<1.70.1.0a0'] -> libboost[version='<0']
  - conda-forge/win-64::pyarrow==0.14.0=py36h803c963_0 -> arrow-cpp[version='>=0.14.0,<0.15.0a0,>=0.14.0,<1.0a0'] -> bzip2[version='>=1.0.8,<2.0a0']
  - conda-forge/win-64::pylint==2.3.1=py36_0 -> astroid[version='>=2.2.0'] -> wrapt
  - conda-forge/win-64::python-libarchive-c==2.8=py36_1004 -> libarchive -> bzip2[version='>=1.0.8,<2.0a0']
  - conda-forge/win-64::spyder==3.3.5=py36_0 -> pylint -> astroid[version='>=2.3.0,<2.4'] -> wrapt
  - conda-forge/win-64::thrift-cpp==0.12.0=hd042d19_1004 -> boost-cpp[version='>=1.70.0,<1.70.1.0a0'] -> libboost[version='<0']
  - conda-forge/win-64::wrapt==1.11.2=py36hfa6e2cd_0
  - cudatoolkit=9.2
  - cudnn -> cudatoolkit[version='>=9.0,<9.1']
  - cupy -> cudatoolkit[version='>=9.0,<9.1.0a0']

albanD · October 16, 2019, 8:02pm

cc @smth maybe you know where this comes from?

Chris_Palmer · October 16, 2019, 8:24pm

I suspect I just have too much going on in this environment, and can of course start from scratch with a new one, but this is something I would like to solve without having to do that, if possible…

albanD · October 16, 2019, 8:31pm

Just with respect to cuda, it seems that cudnn and cupy require cuda 9.0 and so you cannot install the newer versions. Maybe you want to remove these.

Chris_Palmer · October 16, 2019, 8:59pm

Hmm… tried uninstalling CUDNN and see now inconsistencies re other packages. Should I remove these first?

conda uninstall cudnn
Collecting package metadata (repodata.json): done
Solving environment: \
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - conda-forge/noarch::tensorboardx==1.6=py_0
  - conda-forge/win-64::tensorflow-estimator==1.13.0=py36h39e3cac_0
  - conda-forge/win-64::tensorflow==1.13.1=py36h21ff451_1
  - defaults/win-64::tensorflow-gpu==1.13.1=h0d30ee6_0
  - conda-forge/noarch::tensorflow-hub==0.6.0=pyhe1b5a44_0
  - anaconda/win-64::keras-gpu==2.2.4=0

The main reason I want to stick with this (base) environment is to keep using Open AI Gym with Pytorch, which was an ordeal to install under Windows 10. I suspect that some of these dependencies are due to the installation of Gym.

More on my Conda environment - BTW I have not updated conda as the latest version on another PC completely screwed up my conda so I am reluctant to have to start again from scratch should it happen here…:

     active environment : base
    active env location : G:\Anaconda3
            shell level : 1
       user config file : C:\Users\User\.condarc
 populated config files : C:\Users\User\.condarc
          conda version : 4.7.5
    conda-build version : 3.18.6
         python version : 3.6.6.final.0
       virtual packages : __cuda=10.1
       base environment : G:\Anaconda3  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/win-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/ostrokach-forge/win-64
                          https://conda.anaconda.org/ostrokach-forge/noarch
                          https://conda.anaconda.org/anaconda-fusion/win-64
                          https://conda.anaconda.org/anaconda-fusion/noarch
                          https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : G:\Anaconda3\pkgs
                          C:\Users\User\.conda\pkgs
                          C:\Users\User\AppData\Local\conda\conda\pkgs
       envs directories : G:\Anaconda3\envs
                          C:\Users\User\.conda\envs
                          C:\Users\User\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/4.7.5 requests/2.22.0 CPython/3.6.6 Windows/10 Windows/10.0.17134
          administrator : False
             netrc file : None
           offline mode : False

albanD · October 17, 2019, 2:28pm

I don’t usually use conda or windows so I’m not sure how to help you here
I can’t see the inconsistency in the new package list this time. But tensorflow/keras install seems to be problematic.

Chris_Palmer · October 17, 2019, 5:39pm

Thanks for looking anyway @albanD - I appreciate it. I looks like I will need to stay with my current Pytorch 1.1 in order to keep compatibility with the requirements / dependencies of Open AI Gym, Unity Agents etc. that I am using for the Udacity Deep Reinforcement Learning Nanodegree.

One of the reasons I was wanting to upgrade to 1.3 was that I am encountering what might be a bug in using nn.SELU (showing up in functional.py). Perhaps you could help me to fix this without having to upgrade - is it a bug or something wrong with how I’m using SELU?

G:\Anaconda3\lib\site-packages\torch\nn\modules\linear.py in forward(self, input)
     90     @weak_script_method
     91     def forward(self, input):
---> 92         return F.linear(input, self.weight, self.bias)
     93 
     94     def extra_repr(self):

G:\Anaconda3\lib\site-packages\torch\nn\functional.py in linear(input, weight, bias)
   1402         - Output: :math:`(N, *, out\_features)`
   1403     """
-> 1404     if input.dim() == 2 and bias is not None:
   1405         # fused op is marginally faster
   1406         ret = torch.addmm(bias, input, weight.t())

G:\Anaconda3\lib\site-packages\torch\nn\modules\module.py in __getattr__(self, name)
    537                 return modules[name]
    538         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 539             type(self).__name__, name))
    540 
    541     def __setattr__(self, name, value):

AttributeError: 'SELU' object has no attribute 'dim'

albanD · October 17, 2019, 5:59pm

This error is weird. Are you use you dont try and forward a module into another?the input to your linear here seems to be a SELU layer .

Chris_Palmer · October 17, 2019, 6:02pm

Hi @albanD. Sorry, I don’t understand your question. Can you ask another way?

Chris_Palmer · October 17, 2019, 6:03pm

My forward function…

albanD · October 17, 2019, 6:07pm

Exactly. nn.Module are first created, then forwarded by calling them.
x here contains an nn.Module, not the output of the SELU

Chris_Palmer · October 17, 2019, 6:10pm

Hmmm. The commented code uses relu, and it worked. I was trying to substitute instead SELU. They look to me to be the same syntax but that doesn’t work with SELU? Can you please rewrite this the way that would work? Should I use it from the functional module rather than from nn?

    def forward(self, state):
        """Build an actor (policy) network that maps states -> actions."""
        # x = F.relu(self.bn1(self.fc1(state)))
        # x = F.relu(self.fc2(x))
        x = nn.SELU(self.bn1(self.fc1(state)))
        x = nn.SELU(self.fc2(x))
        return torch.tanh(self.fc3(x))

albanD · October 17, 2019, 6:14pm

Functions from the functional module (called F in your example) are functions (hence the module name)
The other elements in nn are Modules. So you need to first create an instance of it: self.my_selu_instance = nn.SELU(inplace=False) and then you can use it x = self.my_selu_instance(self.fc2(x)).

So the updated code would be:

    def forward(self, state):
        """Build an actor (policy) network that maps states -> actions."""
        # x = F.relu(self.bn1(self.fc1(state)))
        # x = F.relu(self.fc2(x))
        x = nn.SELU()(self.bn1(self.fc1(state)))
        x = nn.SELU()(self.fc2(x))
        return torch.tanh(self.fc3(x))

Or even nicer:

    def __init__():
        # Original code
        self.selu = nn.SELU()

    def forward(self, state):
        """Build an actor (policy) network that maps states -> actions."""
        # x = F.relu(self.bn1(self.fc1(state)))
        # x = F.relu(self.fc2(x))
        x = self.selu(self.bn1(self.fc1(state)))
        x = self.selu(self.fc2(x))
        return torch.tanh(self.fc3(x))

Chris_Palmer · October 17, 2019, 6:21pm

Oh, I see - that’s a really elementary mistake of mine. I have been used to using the functional module for activation functions and hadn’t realized the difference between using F and nn. But of course it makes sense now - I don’t use any other instances of nn objects in my code as if they were function calls!

Is there any “best practice” around the use of nn rather than F?

Really appreciate your help!

albanD · October 17, 2019, 6:22pm

nn vs F is really a matter of personal preference. We actually provide both because some people really prefer the Module version to be able to put things in nn.Sequential()s, other prefer writing the forward fully by hand with a functional interface.

Chris_Palmer · October 17, 2019, 6:25pm

OK, thanks - I am very grateful for your attention to this!

Chris_Palmer · October 17, 2019, 6:27pm

Another thing I wanted to use from 1.3 was AdamW - is there any way I can get that working from within 1.1?

albanD · October 17, 2019, 6:33pm

The implementation in master seems self contained. You should be able to copy/paste this file and chang the import of Optimizer to from torch.optim import Optimizer. That will most likely work

Chris_Palmer · October 18, 2019, 3:14am

Awesome - by importing from torch.optim I am effectively chaining this to the end of everything we have avaliable in torch.optim?

dejanbatanjac · October 18, 2019, 8:53am

Hi, Chris_Palmer,

What @albanD wanted to point is to copy the AdamW optimizer since it is just a single PyTorch file.

You can create your own optimizers in PyTorch with no C++ hassle inside.
And the point is you don’t need to migrate to PyTorch 1.3 if you only wanted to use AdamW.

BTW, based on your line:

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

I installed PyTorch 1.3.0 using Conda on Windows and it just works fine. Thank you so much.