Distribution of pytorch extension

GuillaumeLeclerc · February 21, 2023, 9:13pm

Hello,

I implemented a new operator in cuda and would like to make it available it on pip/conda. What is the best way to do it?

Ideally I would build a wheel but I’ve met a lot of issues generating something that is remotely portable from an environment/machine to another. Another issue I don’t know how to solve is how to automatically pick the wheel based on the pytorch version the user has.
If users compile when the package is installed then they need nvcc which is not installed with pytorch using the recommended installation commands. Moreover it seems that the GCC version that ships with conda isn’t compatible with cuda 11.7 so it seems I would need users to have a custom combination of GCC/cuda installed outside of conda.

Am I missing something? What is the recommended way of distributing pytorch extensions ?

Thanks for the help

GuillaumeLeclerc · February 21, 2023, 9:56pm

Related to the wheel for example a wheel compiled on another machine with the same version of python, the same major version of pytorch yield to this error when imported:

site-packages/fast_jl.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

(which is a symbol related to pytorch)

ptrblck · February 22, 2023, 5:33am

Creating binaries is not trivial if you are shipping C++/CUDA code and you could take a look at our pytorch/builder repository to see how the general process is done for PyTorch.
Ideally, you would ship your package with a statically linked CUDA runtime library and would then link against PyTorch and its dependencies.

GuillaumeLeclerc · February 22, 2023, 7:36pm

So statically link against CUDA and dynamic link against pytorch ? What if pytorch upgrades ?

Do you have an example of such thing ?

GuillaumeLeclerc · February 22, 2023, 8:23pm

I can’t even have it compile reliably at install time with GCC 11.3 and the latest version of pytorch it fails with this issue:

github.com/NVlabs/instant-ngp

error: parameter packs not expanded with ‘...’

opened 01:34PM - 07 Feb 22 UTC

closed 09:41AM - 10 Feb 22 UTC

duckworthd

I suspect this issue is not directly related to instant-ngp's code, but I'll pos…t it here all the same in case anyone else sees a similar issue. TL;DR there's an issue compiling `std_function.h`, a dependency of `tiny-cuda-nn`. ``` > cmake --build build --config RelWithDebInfo Consolidate compiler generated dependencies of target tiny-cuda-nn [ 1%] Building CUDA object dependencies/tiny-cuda-nn/src/CMakeFiles/tiny-cuda-nn.dir/common.cu.o /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’: 435 | function(_Functor&& __f) | ^ /usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’ /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’: 530 | operator=(_Functor&& __f) | ^ /usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’ gmake[2]: *** [dependencies/tiny-cuda-nn/src/CMakeFiles/tiny-cuda-nn.dir/build.make:76: dependencies/tiny-cuda-nn/src/CMakeFiles/tiny-cuda-nn.dir/common.cu.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:306: dependencies/tiny-cuda-nn/src/CMakeFiles/tiny-cuda-nn.dir/all] Error 2 gmake: *** [Makefile:91: all] Error 2 ``` Here's a bit more info on my setup. I'm using CUDA v11.4.4 and GCC v.11.2.0 on Debian. I saw an identical issue when compiling with CUDA v11.6.0. ``` > nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_Oct_11_21:27:02_PDT_2021 Cuda compilation tools, release 11.4, V11.4.152 Build cuda_11.4.r11.4/compiler.30521435_0 > gcc --version gcc (Debian 11.2.0-13) 11.2.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` Any tips appreciated!

It works with older version of GCC but since 11.3 is the default shipped with ubuntu 22 it will be a pain for users to have to downgrade their compiler to install a python package.

Do you think it would be simpler to ditch the interface with pytorch and maybe use cupy or pycuda ?

ptrblck · February 22, 2023, 10:34pm

Statically linking the CUDA runtime is the right approach. I don’t know which additional CUDA libraries you would need.

In that case your package might need to be updated as well assuming it depends on PyTorch directly. If you don’t use any PyTorch operations, your updates would be more flexible.
You can also take a look at e.g. pytorch_geometric which should have a similar release process you are targeting.

I don’t fully understand why users would need to build your package if you are planning to release wheels.

I think it depends on your use case and what your package is supposed to do.
If you don’t have any dependency on PyTorch and are not using any of its operations, you might not want to add it as a dependency and e.g. using pycuda might be easier.