Error occurred while compiling CUDA program using PyTorch

I encountered this issue while compiling a CUDA file using PyTorch, along with a series of errors related to CUB. However, if I directly compile the corresponding file using nvcc, no issues arise. I’m eager to know how to resolve this problem.
correct command without error:*nvcc -o main .cu

from setuptools import setup, find_packages
from torch.utils.cpp_extension import BuildExtension, CUDAExtension, CppExtension

    name = "xxx",
    include_dirs = ["."],
    ext_modules = [
        sources = [
                'cxx': ['-std=c++14', '-g',
                        '-Wall', '-fopenmp', '-march=native'],
                'nvcc': ['-std=c++14',
                         '--compiler-options', "'-fPIC'",
        "build_ext": BuildExtension 

Related file

#include <iostream>
#include <curand_kernel.h>
#include <vector>
#include <chrono>
#include <numeric>
#include <fstream>
#include <algorithm>
#include <map>
#include <sstream>
#include <cassert>
#include <cuda_runtime.h>
#include <stdint.h>
#include <cub/cub.cuh>
#include <torch/extension.h>

inline __device__ int64_t
AtomicCAS(int64_t* const address, const int64_t compare, const int64_t val) {
  using Type = unsigned long long int;  // NOLINT

  static_assert(sizeof(Type) == sizeof(*address), "Type width must match");

  return atomicCAS(
      reinterpret_cast<Type*>(address), static_cast<Type>(compare),

The brief error message is error: ‘atomicCAS’ was not declared in this scope; did you mean ‘AtomicCAS’?