I’m unable to build the FlowNet 2.0 CUDA kernels for the layers channelnorm, resample2d, correlation when using PyTorch >= 1.5.1. However, I’m able to successfully build and use them with PyTorch <= 1.4.0. Is there a way to make this work since I need to use PyTorch >= 1.5.1?
Following is a snippet of the long error log that I get:
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File “/home/software/miniconda2/envs/cu101torch151s2/lib/python3.6/site-packages/torch/utils/cpp_extension.py”, line 1423, in _run_ninja_build
File “/home/software/miniconda2/envs/cu101torch151s2/lib/python3.6/subprocess.py”, line 438, in run
subprocess.CalledProcessError: Command ‘[‘ninja’, ‘-v’]’ returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “setup.py”, line 32, in
File “/home/software/miniconda2/envs/cu101torch151s2/lib/python3.6/site-packages/setuptools/init.py”, line 163, in setup
File “/home/software/miniconda2/envs/cu101torch151s2/lib/python3.6/site-packages/torch/utils/cpp_extension.py”, line 1163, in _write_ninja_file_and_compile_objects
error_prefix=‘Error compiling objects for extension’)
File “/home/software/miniconda2/envs/cu101torch151s2/lib/python3.6/site-packages/torch/utils/cpp_extension.py”, line 1436, in _run_ninja_build
RuntimeError: Error compiling objects for extension
For reproducing the error with PyTorch >= 1.5.1 (installed using conda):
# get flownet2-pytorch source
git clone https://github.com/NVIDIA/flownet2-pytorch.git
# install custom layers
Could you disable
ninja for the build of the custom extension and post the stack trace with the error message here, please?
Hi @ptrblck, I disabled ninja for the build. The complete stack trace is too long for my own terminal but here are some of the error messages:
/home/rakesh/software/miniconda2/envs/cu101torch16/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/modules/container/sequential.h: In member function ‘ReturnType torch::nn::SequentialImpl::forward(InputTypes&& ...)’:
/home/rakesh/software/miniconda2/envs/cu101torch16/lib/python3.6/site-packages/torch/include/c10/util/Exception.h:333:9: error: ‘str’ is not a member of ‘c10’
/home/rakesh/software/miniconda2/envs/cu101torch16/lib/python3.6/site-packages/torch/include/c10/util/TypeCast.h:57:58: error: ‘apply’ is not a member of ‘c10::maybe_real<true, c10::complex<double> >’
/home/rakesh/software/miniconda2/envs/cu101torch16/lib/python3.6/site-packages/torch/include/c10/util/Optional.h:408:23: error: cannot bind ‘c10::intrusive_ptr<torch::jit::InlinedCallStack>’ lvalue to ‘c10::intrusive_ptr<torch::jit::InlinedCallStack>&&’
contained_val() = std::forward<U>(v);
/home/rakesh/software/miniconda2/envs/cu101torch16/lib/python3.6/site-packages/torch/include/c10/util/Optional.h:408:23: error: no match for ‘operator=’ (operand types are ‘const std::shared_ptr<torch::jit::Graph>’ and ‘std::shared_ptr<torch::jit::Graph>’)
/home/rakesh/software/miniconda2/envs/cu101torch16/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:56: error: no matching function for call to ‘at::Tensor::to(const c10::Device&) const’
auto data = device && tensor.device() != *device ?
Thanks for the stack trace.
You could pipe the log output to a file in case your terminal gets flooded.
That being said, if the posted error is the first one, I would assume that a stale build is creating this issue.
Could you clean the build and update the submodules before trying to rebuild?
python setup.py clean
git submodule update --init --recursive
python setup.py install 2>&1 | tee install.log
Hi @ptrblck, thanks! I noticed that the first error in
/home/rakesh/software/miniconda2/envs/cu101torch16/lib/python3.6/site-packages/torch/include/c10/util/C++17.h:24:2: error: #error You need C++14 to compile PyTorch #error You need C++14 to compile PyTorch
I changed the
'-std=c++14' which fixed the errors.