Link errors for extension with -rdc=true


Trying to figure out a linker issue, specifically a runtime .so loading error
undefined symbol: __cudaRegisterLinkedBinary
on a function compiled and linked with -rdc=true (separate compilation).
My setup is
NVCC_FLAGS := -arch sm_61 -gencode=arch=compute_61,code=compute_61
-std=c++11 -O3 --use_fast_math -x cu -Xcompiler -fPIC -lineinfo -rdc=true # -ptx
NVCC_LINK_FLAGS := -arch sm_61 -lcudadevrt -lcudart -Xcompiler -fPIC -shared

$(NVCC_COMPILE) build/file.o src/ $(NVCC_FLAGS) $(INCLUDE_FLAGS)
$(NVCC_LINK) -o build/ build/file.o $(NVCC_LINK_FLAGS)

Then in

extra_link_args=["-L/usr/local/cuda-9.0/targets/x86_64-linux/lib/ -lcudadevrt -lcudart"],

extra_objects += [osp.join(abs_path, ‘build/’)]
extra_objects += [osp.join(abs_path, ‘build/file.o’)]
extra_objects += glob.glob(’/usr/local/cuda/lib64/*.a’)

The final extension .so file doesn’t have the reference to __cudaRegisterLinkedBinary

Did anyone have any experience with separate compilation combined with create_extension?

Never mind i figured it out. This setup works, the link error was coming from a different file with similar name that didn’t need -rdc=true which I missed. Hopefully this is useful for future reference.

Hello Andrei,
If you still have it handy, could you share the boilerplate of the setup you’ve mentioned above?
I think the community can really benefit form this given that there aren’t any specific examples that popup when it comes to extending to PyTorch using dynamic parallelism.

