Link errors for extension with -rdc=true

Hello,

Trying to figure out a linker issue, specifically a runtime .so loading error
undefined symbol: __cudaRegisterLinkedBinary
on a function compiled and linked with -rdc=true (separate compilation).
My setup is
NVCC_FLAGS := -arch sm_61 -gencode=arch=compute_61,code=compute_61
-std=c++11 -O3 --use_fast_math -x cu -Xcompiler -fPIC -lineinfo -rdc=true # -ptx
NVCC_LINK_FLAGS := -arch sm_61 -lcudadevrt -lcudart -Xcompiler -fPIC -shared

$(NVCC_COMPILE) build/file.o src/file.cu $(NVCC_FLAGS) $(INCLUDE_FLAGS)
$(NVCC_LINK) -o build/file_dlink.so build/file.o $(NVCC_LINK_FLAGS)

Then in build_ffi.py:

extra_compile_args=["-std=c99"],
extra_link_args=["-L/usr/local/cuda-9.0/targets/x86_64-linux/lib/ -lcudadevrt -lcudart"],

extra_objects += [osp.join(abs_path, ‘build/file_dlink.so’)]
extra_objects += [osp.join(abs_path, ‘build/file.o’)]
extra_objects += glob.glob(’/usr/local/cuda/lib64/*.a’)

The final extension .so file doesn’t have the reference to __cudaRegisterLinkedBinary

Did anyone have any experience with separate compilation combined with create_extension?

Never mind i figured it out. This setup works, the link error was coming from a different file with similar name that didn’t need -rdc=true which I missed. Hopefully this is useful for future reference.

Hello Andrei,
If you still have it handy, could you share the boilerplate of the setup you’ve mentioned above?
I think the community can really benefit form this given that there aren’t any specific examples that popup when it comes to extending to PyTorch using dynamic parallelism.

You can also dm meat twitter @KnockturnalNed