Building pytorch from source, how to use old binarys for old GPU?

SuperSaiyanTech · February 17, 2019, 1:02am

Hi all,

I am attempting to install pytorch 1.0.1 and to get it working with my GTX 760. as you know GTX 760 is no longer supported past 0.3.1, but a friend suggested building it from source will allow me to use pytorch 1.x.x with my GPU.

My question is at what point do I do it so that my old GPU will work? I am using the readme.md found on the github, what step should I add to get my GPU to work.

I want to avoid using an older pytorch version if possible

ptrblck · February 18, 2019, 1:44am

Was the GPU detected during the compilation?
If not, you might want to add CUDA_ARCH_LIST="3.0" to your build command.
I think 3.0 should be the right one for Kepler GPUs, but I’m not sure, so you might want to look it up if that’s not working.

SuperSaiyanTech · February 18, 2019, 2:33am

thanks for your response so it was not,

I found it will detect it in my build if I ran the following

sudo CUDA_HOME="/opt/cuda" python setup.py install

now I am getting another error now

make[2]: Entering directory '/root/pytorch/third_party/nccl/nccl/src/collectives/device'
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
nvlink fatal   : Internal error: reference to deleted section
make[2]: *** [Makefile:83: /root/pytorch/build/nccl/obj/collectives/device/devlink.o] Error 1
make[2]: Leaving directory '/root/pytorch/third_party/nccl/nccl/src/collectives/device'
make[1]: *** [Makefile:45: devicelib] Error 2
make[1]: Leaving directory '/root/pytorch/third_party/nccl/nccl/src'
make: *** [Makefile:25: src.build] Error 2
[10/2853] Building CXX object third_party/protobuf/cmake/CMakeFiles/libprotoc.dir/__/src/google/protobuf/compiler/csharp/csharp_message.cc.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "setup.py", line 734, in <module>
    build_deps()
  File "setup.py", line 281, in build_deps
    build_dir='build')
  File "/root/pytorch/tools/build_pytorch_libs.py", line 248, in build_caffe2
    check_call(ninja_cmd, cwd=build_dir, env=my_env)
  File "/opt/anaconda/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ninja', 'install']' returned non-zero exit status 1.

SuperSaiyanTech · February 18, 2019, 8:01am

adding your suggestion g8ves same error as above installing without cuda gives no errors

ptrblck · February 18, 2019, 12:53pm

Could you try to export the env variable before running python setup.py install:
export TORCH_CUDA_ARCH_LIST=3.0

I think I posted the wrong env variable name.

SamLJack · February 18, 2019, 2:27pm

(post withdrawn by author, will be automatically deleted in 24 hours unless flagged)

SuperSaiyanTech · February 18, 2019, 2:31pm

so the build goes further but then it now throws the same error the ninja.

the full error can be found at

https://pastebin.com/1SZPHshS

SuperSaiyanTech · February 18, 2019, 10:46pm

So I am able to figure out where the issue lies, but not sure how to fix.

The issue is with CUDA, I tried again and compiling is fine when I set Use_CUDA to FALSE, but when I install CUDA it spits out that ninja error above how should I go about fixing this?

which CUDA do I point it to the anaconda CUDA toolkit or the one I installed with pacman -S cuda? I will try everything I can for now I feel I am getting closer

ptrblck · February 18, 2019, 11:10pm

If you’re building from source, you should use your own installed CUDA.
Not sure if the conda CUDA toolkit could work, as I’ve never tried it.

SuperSaiyanTech · February 19, 2019, 12:47am

okay so I reinstalled CUDA and it worked up into the last step which is installation.

now it gives me this error, I checked and it should have cuDNN i have it installed

-- Building with NumPy bindings
-- Not using cuDNN
-- Not using MIOpen
-- Detected CUDA at /opt/cuda
-- Using MKLDNN
-- Building NCCL library
-- Building with THD distributed package 
-- Building with c10d distributed package 

Copying extension caffe2.python.caffe2_pybind11_state
Copying caffe2.python.caffe2_pybind11_state from torch/lib/python3.7/site-packages/caffe2/python/caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so to /home/sammy/pytorch/build/lib.linux-x86_64-3.7/caffe2/python/caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so
copying torch/lib/python3.7/site-packages/caffe2/python/caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so -> /home/sammy/pytorch/build/lib.linux-x86_64-3.7/caffe2/python

Copying extension caffe2.python.caffe2_pybind11_state_gpu
Copying caffe2.python.caffe2_pybind11_state_gpu from torch/lib/python3.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so to /home/sammy/pytorch/build/lib.linux-x86_64-3.7/caffe2/python/caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so
copying torch/lib/python3.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so -> /home/sammy/pytorch/build/lib.linux-x86_64-3.7/caffe2/python
building 'torch._C' extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/torch
creating build/temp.linux-x86_64-3.7/torch/csrc
/usr/bin/gcc-4.9 -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/anaconda/include/python3.7m -c torch/csrc/stub.cpp -o build/temp.linux-x86_64-3.7/torch/csrc/stub.o -std=c++11 -Wall -Wextra -Wno-strict-overflow -Wno-unused-parameter -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-deprecated-declarations -fno-strict-aliasing -Wno-missing-braces
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
/usr/bin/g++-4.9 -pthread -shared -B /opt/anaconda/compiler_compat -L/opt/anaconda/lib -Wl,-rpath=/opt/anaconda/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/torch/csrc/stub.o -L/home/sammy/pytorch/torch/lib -L/opt/cuda/lib64 -lshm -ltorch_python -o build/lib.linux-x86_64-3.7/torch/_C.cpython-37m-x86_64-linux-gnu.so -Wl,--no-as-needed /home/sammy/pytorch/torch/lib/libcaffe2_gpu.so -Wl,--as-needed -Wl,-rpath,$ORIGIN/lib
/opt/anaconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/torch/csrc/stub.o: unable to initialize decompress status for section .debug_info
/opt/anaconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/torch/csrc/stub.o: unable to initialize decompress status for section .debug_info
/opt/anaconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/torch/csrc/stub.o: unable to initialize decompress status for section .debug_info
/opt/anaconda/compiler_compat/ld: build/temp.linux-x86_64-3.7/torch/csrc/stub.o: unable to initialize decompress status for section .debug_info
build/temp.linux-x86_64-3.7/torch/csrc/stub.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command '/usr/bin/g++-4.9' failed with exit status 1

SuperSaiyanTech · February 19, 2019, 1:29am

Update: this person had the same error as me will attempt his solution will report back

SuperSaiyanTech · February 19, 2019, 2:34am

It installed fine, but it turns out V1.1 of pytorch does not work with 3.0 and need compute capability of 3.5+ even when installing from source? the error I get when I try to run an example

/opt/anaconda/lib/python3.7/site-packages/torch/cuda/__init__.py:118: UserWarning: 
    Found GPU0 GeForce GTX 760 which is of cuda capability 3.0.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability that we support is 3.5.
    
  warnings.warn(old_gpu_warn % (d, name, major, capability[1]))

now I will try to install 1.0 and if that does not work will try 0.4.X

SuperSaiyanTech · February 19, 2019, 4:14am

aff soo still not working I am about to give up they build and install fine now but I get the same “your GPU is too old” even tho it is picking up on that it is 3.0 when it is building. I tried v1.01, 1.0.0, and 4.1 none are working

any ideas?

EDIT: I was able to solve this by updating CUDA from 8.0 to 10.0 all seems good now

ptrblck · February 19, 2019, 6:32am

Then you won’t be able to do much, as apparently some CUDA methods are used which need compute capability >= 3.5. I’m sorry you have spent so much time building it.

You could try to run your code locally on the CPU or alternatively on Google Colab using their GPUs.

Edit: Just saw your edit now. Is it working now with CUDA10.0 and your GPU?

SuperSaiyanTech · February 19, 2019, 7:02am

yep working now upgrading to CUDA 10, and using gcc4.9 and it all is good thanks for your time!