ImportError: undefined symbol

Picus · September 15, 2024, 3:38pm

Hi! I’m trying to make my first Pytorch extension!
I use setuptools, pybind and cmake to do it.

The compilation with python setup.py install works fine but at execution time, I get this error that I’ve never seen before: ImportError: <path_to_the_lib_so_file>: undefined symbol: _ZN8pybind116detail11type_casterIN2at6TensorEvE4loadENS_6handleEb

Does anyone have a clue of what it even means?
Here is the code:

my_extension.cpp

#include <torch/extension.h>
#include <pybind11/pybind11.h>

torch::Tensor multiply_by_two(torch::Tensor input) {
    return input * 2;
}

PYBIND11_MODULE(my_extension, m) {
    m.def("multiply_by_two", &multiply_by_two, "A function that multiplies each element in a tensor by 2");
}

CMakelists.txt

cmake_minimum_required(VERSION 3.5)
project(my_extension LANGUAGES CXX)

// Doesn't work without this for some reason
set(CMAKE_PREFIX_PATH "/home/x00/anaconda3/envs/retro/lib/python3.10/site-packages/torch")

find_package(pybind11 REQUIRED)
find_package(Torch REQUIRED)

pybind11_add_module(my_extension my_extension.cpp)

target_link_libraries(my_extension PRIVATE ${TORCH_LIBRARIES} ${CMAKE_DL_LIBS})
target_include_directories(my_extension PRIVATE ${TORCH_INCLUDE_DIRS})

gauenk · September 21, 2024, 2:44pm

The easiest thing is to not use CMake, but rather let setuptools do the compiling. So your command will be python -m pip install -e . (like you are already doing), but you’ll need to create a setup.py file by following the docs. Here is an example of mine for reference.

However, this is dreadfully slow so I’ve been looking for a way to use CMake instead. Setuptools seems to re-compile the entire project each time (which is slow), and it is much easier to use ccache to speed up the compiling process when using CMake. The speed-up of ccache is significant, as it brings a compile time of several minutes down to only 2 - 4 seconds.

I stumbled on this thread since I am having linking errors like you with pybind11. So if I solve the issue and remember, I will post my findings back here.

gauenk · September 21, 2024, 5:47pm

Okay, so I’ve figured out how to speed up compiling using setuptools and not cmake. At the top of your setup.py file add the following few lines:

import os
os.environ['PYTORCH_NVCC'] = "ccache nvcc"
os.environ['TORCH_EXTENSION_SKIP_NVCC_GEN_DEPENDENCIES'] = '1'

To make this work, you may have to compile ccache from source since the “apt-get” version seems to be out of date. Then I only link the built ccache to the c++ compilers:

ln -s ccache /usr/local/bin/gcc
ln -s ccache /usr/local/bin/g++
ln -s ccache /usr/local/bin/c++

Hopefully it goes smoothly for you as well.

Picus · September 25, 2024, 12:41am

Thanks, but I’m not sure I can use that.

See, this is my first try but, in the future, I plan to manually access to the content of tensors on the GPU with ROCm. For that, I need to compile my code using hipcc. With CMake if would just need to use project(my_extension LANGUAGES HIP) but I don’t think I can change the compiler used by setuptools, can I?

Picus · September 25, 2024, 1:20am

Update: it doesn’t work.

I have to choose between undefined symbol or Pytorch crying because I’m not using gcc or g++. I tried to bypass this check but the error that occures after that it’s impossible to debug for me.

gauenk · September 25, 2024, 3:40am

I am probably not much help since I haven’t worked with rocm. The following link suggests it works on at least one setup:

There is also an open issue on Pytorch:

github.com/pytorch/tutorials

Custom rocm hip and c++ extensions

opened 05:12PM - 31 Dec 22 UTC

salykova

triaged module: rocm advanced amd docathon-h2-2023 ciflow/rocm

### 🚀 The feature, motivation and pitch Dear PyTorch developers and community…, We have nice tutorial [cpp_extension](https://pytorch.org/tutorials/advanced/cpp_extension.html) on custom cuda extensions written by Peter Goldsborough. I’m wondering if the same can be done but on AMD GPUs with kernels written using rocm HIP. I mean the following: call custom forward+backward hip kernel from pytorch and include it in deep learning pipeline. Is it currently supported and are there any limitations? Does somebody have experience of writing custom hip/c++ kernels and using them in pytorch? cc @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport

Best wishes