Can't compile extension using Makefile - need guidance

I’m going insane trying to compile extension code that compiles normally with torch.utils.cpp_extension.CppExtension or setuptools. I want to compile a separate .cpp program that simply tests the CUDA kernel, so that I can profile the behavior of the kernel with NVVP. The issue here is that the cpp package I have now uses torch tensors from ATen.

With the current Makefile that I have, if I try to use even the tiniest bit of ATen code, I get bizarre linkage errors that look a lot like:

Can someone help me out of this mess? Should I try to learn to use a CMake file? Where would I go to learn to do that?

Makefile that I currently have:

NVCC=nvcc
RM=rm -rf
PYTHON_HEADER_DIR := $(shell python -c 'from distutils.sysconfig import get_python_inc; print(get_python_inc())')
PYTORCH_INCLUDES := $(shell python -c 'from torch.utils.cpp_extension import include_paths; [print(p) for p in include_paths()]')
PYTORCH_LIBRARIES := $(shell python -c 'from torch.utils.cpp_extension import library_paths; [print(p) for p in library_paths()]')

# CUDA ROOT DIR that contains bin/ lib64/ and include/
# CUDA_DIR := /usr/local/cuda
CUDA_DIR := $(shell python -c 'from torch.utils.cpp_extension import _find_cuda_home; print(_find_cuda_home())')
INCLUDE_DIRS := ./ $(CUDA_DIR)/include
INCLUDE_DIRS += $(PYTHON_HEADER_DIR)
INCLUDE_DIRS += $(PYTORCH_INCLUDES)
COMMON_FLAGS += $(foreach includedir,$(INCLUDE_DIRS),-I$(includedir)) -DTORCH_API_INCLUDE_EXTENSION_H -D_GLIBCXX_USE_CXX11_ABI=0


CUDA_ARCH := -gencode arch=compute_35,code=sm_35 \
#               -gencode arch=compute_35,code=sm_35 \
                -gencode arch=compute_50,code=sm_50 \
                -gencode arch=compute_52,code=sm_52 \

#LIBRARIES += stdc++ cudart c10 caffe2 torch torch_python caffe2_gpu
LIBS= -lopenblas -lpthread -lcudart -lcublas
LIBS += $(PYTORCH_LIBRARIES)

NVCCFLAGS= -std=c++11 -c $(CUDA_ARCH) -ccbin=$(CXX) -Xcompiler -fPIC $(COMMON_FLAGS)
CCFLAGS= -O3 -std=c++11 $(COMMON_FLAGS)


all: build

build: cpu gpu
        $(NVCC) $(NVCCFLAGS) -o test *.o

cpu:
        $(CC) $(CCFLAGS) *.cpp

gpu:
        $(NVCC) $(NVCCFLAGS) *.cu

clean:
        $(RM) test *.o

Try to use make to compile this test file, no others are needed:

#include <ATen/ATen.h>

int main(){


        at::Tensor a = at::ones({2, 2}, at::kInt);

        return 0;
}

should result in something like:

g++ -std=c++11 -O3 -std=c++11 -I./ -I/usr/local/cuda/include -I/usr/local/anaconda/envs/base3/include/python3.6m -I/usr/local/anaconda/envs/base3/lib/python3.6/site-packages/torch/lib/include -I/usr/local/anaconda/envs/base3/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/usr/local/anaconda/envs/base3/lib/python3.6/site-packages/torch/lib/include/TH -I/usr/local/anaconda/envs/base3/lib/python3.6/site-packages/torch/lib/include/THC -DTORCH_API_INCLUDE_EXTENSION_H -D_GLIBCXX_USE_CXX11_ABI=0  *.cpp
/tmp/ccGWWEwi.o: In function `c10::Device::Device(c10::DeviceType, short)':
a.cpp:(.text._ZN3c106DeviceC2ENS_10DeviceTypeEs[_ZN3c106DeviceC5ENS_10DeviceTypeEs]+0xa4): undefined reference to `c10::Error::Error(c10::SourceLocation, std::string const&)'
a.cpp:(.text._ZN3c106DeviceC2ENS_10DeviceTypeEs[_ZN3c106DeviceC5ENS_10DeviceTypeEs]+0x172): undefined reference to `c10::Error::Error(c10::SourceLocation, std::string const&)'
/tmp/ccGWWEwi.o: In function `c10::intrusive_ptr<at::TensorImpl, at::UndefinedTensorImpl>::reset_()':
a.cpp:(.text._ZN3c1013intrusive_ptrIN2at10TensorImplENS1_19UndefinedTensorImplEE6reset_Ev[_ZN3c1013intrusive_ptrIN2at10TensorImplENS1_19UndefinedTensorImplEE6reset_Ev]+0xe): undefined reference to `at::UndefinedTensorImpl::_singleton'
a.cpp:(.text._ZN3c1013intrusive_ptrIN2at10TensorImplENS1_19UndefinedTensorImplEE6reset_Ev[_ZN3c1013intrusive_ptrIN2at10TensorImplENS1_19UndefinedTensorImplEE6reset_Ev]+0x4f): undefined reference to `at::UndefinedTensorImpl::_singleton'
/tmp/ccGWWEwi.o: In function `main':
a.cpp:(.text.startup+0x47): undefined reference to `caffe2::detail::_typeMetaDataInstance_preallocated_3'
a.cpp:(.text.startup+0x99): undefined reference to `c10::impl::device_guard_impl_registry'
a.cpp:(.text.startup+0x171): undefined reference to `at::native::ones(c10::ArrayRef<long>, at::TensorOptions const&)'
a.cpp:(.text.startup+0x2cb): undefined reference to `c10::operator<<(std::ostream&, c10::DeviceType)'
a.cpp:(.text.startup+0x3d8): undefined reference to `c10::Error::Error(c10::SourceLocation, std::string const&)'
collect2: error: ld returned 1 exit status
Makefile:36: recipe for target 'cpu' failed
make: *** [cpu] Error 1

What is the version of your g++ compiler?
Per https://pytorch.org/tutorials/advanced/cpp_extension.html:

Due to ABI versioning issues, the compiler you use to build your C++ extension must be ABI-compatible with the compiler PyTorch was built with. In practice, this means that you must use GCC version 4.9 and above on Linux. For Ubuntu 16.04 and other more-recent Linux distributions, this should be the default compiler already. On MacOS, you must use clang (which does not have any ABI versioning issues). In the worst case, you can build PyTorch from source with your compiler and then build the extension with that same compiler.