Just-in-time loading and compiling CUDA kernels was unsuccesful

I’m planning on using this repo GitHub - asappresearch/sru: Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755), which contains a RNN variant that’s fast to train. I followed the installation, which was very simple, but got the following error:

/home/hnguyen/sru/sru/cuda_functional.py:23: UserWarning: Just-in-time loading and compiling the CUDA kernels of SRU was unsuccessful. Got the following error:
Error building extension 'sru_cuda': [1/2] /usr/local/cuda-10.1/bin/nvcc --generate-dependencies-with-compile --dependency-output sru_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=sru_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/hnguyen/miniconda3/lib/python3.8/site-packages/torch/include -isystem /home/hnguyen/miniconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/hnguyen/miniconda3/lib/python3.8/site-packages/torch/include/TH -isystem /home/hnguyen/miniconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/hnguyen/miniconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/hnguyen/sru/sru/csrc/sru_cuda_kernel.cu -o sru_cuda_kernel.cuda.o 
FAILED: sru_cuda_kernel.cuda.o 
/usr/local/cuda-10.1/bin/nvcc --generate-dependencies-with-compile --dependency-output sru_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=sru_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/hnguyen/miniconda3/lib/python3.8/site-packages/torch/include -isystem /home/hnguyen/miniconda3/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/hnguyen/miniconda3/lib/python3.8/site-packages/torch/include/TH -isystem /home/hnguyen/miniconda3/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.1/include -isystem /home/hnguyen/miniconda3/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/hnguyen/sru/sru/csrc/sru_cuda_kernel.cu -o sru_cuda_kernel.cuda.o 
nvcc fatal   : Unknown option '-generate-dependencies-with-compile'
ninja: build stopped: subcommand failed.

  warnings.warn("Just-in-time loading and compiling the CUDA kernels of SRU was unsuccessful. "

after running

import torch
from sru import SRU

This issue was not mentioned anywhere in existing issues, probably because it’s kind of uncommon.

Does someone understand what this error is trying to say?

The --generate-dependencies-with-compile argument was added in CUDA10.2, if I’m not mistaken, so you might need to update your local CUDA toolkit.