I encountered this issue while compiling a CUDA file using PyTorch, along with a series of errors related to CUB. However, if I directly compile the corresponding file using nvcc, no issues arise. I’m eager to know how to resolve this problem.
correct command without error:*nvcc -o main .cu
setup.py
from setuptools import setup, find_packages
from torch.utils.cpp_extension import BuildExtension, CUDAExtension, CppExtension
setup(
name = "xxx",
include_dirs = ["."],
ext_modules = [
CUDAExtension(
"xxx",
sources = [
"xxx.cu","xxx1.cu","xxx2.cu","xxx3.cpp",
],
extra_compile_args={
'cxx': ['-std=c++14', '-g',
'-fPIC',
'-Ofast',
'-DSXN_REVISED',
'-Wall', '-fopenmp', '-march=native'],
'nvcc': ['-std=c++14',
'-g',
'-DSXN_REVISED',
'--compiler-options', "'-fPIC'",
]
}
)
],
cmdclass={
"build_ext": BuildExtension
},
)
Related file
#include <iostream>
#include <curand_kernel.h>
#include <vector>
#include <chrono>
#include <numeric>
#include <fstream>
#include <algorithm>
#include <map>
#include <sstream>
#include <cassert>
#include <cuda_runtime.h>
#include <stdint.h>
#include <cub/cub.cuh>
#include <torch/extension.h>
inline __device__ int64_t
AtomicCAS(int64_t* const address, const int64_t compare, const int64_t val) {
using Type = unsigned long long int; // NOLINT
static_assert(sizeof(Type) == sizeof(*address), "Type width must match");
return atomicCAS(
reinterpret_cast<Type*>(address), static_cast<Type>(compare),
static_cast<Type>(val));
}
The brief error message is error: ‘atomicCAS’ was not declared in this scope; did you mean ‘AtomicCAS’?