Cpp_extension.load_inline: uses nvrtc or spins processes? caches output?

I have a cpp_extension that is just some cuda_sources, my cpp_sources are just function declarations. It seems that load_inline still creates processes and it’s not clear whether/how it caches output. Is it true?

Can I force it to use nvrtc? Can I do without specifying function declarations in cpp_sources if function definitions are in cuda_sources?

cc @goldsborough

Without changing anything, I get the following verbose output at every run. It seems caching is broken. And it seems that it’s not using (faster?) nvrtc for some reason. I assume nvrtc support is included because it ought to be used for fusion / kernel generation.

Using /home/kantorov/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/kantorov/.cache/torch_extensions/roipooling_contextlocnet/build.ninja...
Building extension module roipooling_contextlocnet...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=roipooling_contextlocnet -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/include -isystem /home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/include/TH -isystem /home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/vadim/prefix/miniconda/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/kantorov/.cache/torch_extensions/roipooling_contextlocnet/main.cpp -o main.o

[2/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=roipooling_contextlocnet -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/include -isystem /home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/include/TH -isystem /home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /home/vadim/prefix/miniconda/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++14 -c /home/kantorov/.cache/torch_extensions/roipooling_contextlocnet/cuda.cu -o cuda.cuda.o
/home/kantorov/.cache/torch_extensions/roipooling_contextlocnet/cuda.cu(82): warning: variable "is_empty" was declared but never referenced

[3/3] c++ main.o cuda.cuda.o -shared -L/home/vadim/prefix/miniconda/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o roipooling_contextlocnet.so
Loading extension module roipooling_contextlocnet...