Where else do I need to 'register' my custom native ATen function?

I want to implement my own version of Tensor.trace() by following along the native/README.md.

As an initial test, I just duplicated the code corresponding to trace in
aten/src/ATen/native/native_functions.yaml,
aten/src/ATen/native/ReduceOps.cpp,
aten/src/ATen/native/cuda/TriangularOps.cu and
tools/autograd/derivatives.yaml,
and added the prefix ‘my_’ to the relevant declarations. This means I now got the following:
native_functions.yaml:

- func: my_trace(Tensor self) -> Tensor
  variants: method, function
  dispatch:
    CPU: my_trace_cpu
    CUDA: my_trace_cuda

- func: my_trace_backward(Tensor grad, int[] sizes) -> Tensor
  variants: function
  device_check: NoCheck
  device_guard: False

derivatives.yaml:

- name: my_trace(Tensor self) -> Tensor
  self: my_trace_backward(grad, self.sizes())

ReduceOps.cpp:

Tensor my_trace_cpu(const Tensor& self) { ... }

TriangularOps.cu:

Tensor my_trace_cuda(const Tensor& self) { ... }

There still seems to be something missing somewhere, because during my build I get the following error:

FAILED: bin/conv_to_nnpack_transform_test
: && /usr/lib/ccache/c++ -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_AVX_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -rdynamic -Wl,-Bsymbolic-functions caffe2/CMakeFiles/conv_to_nnpack_transform_test.dir/transforms/conv_to_nnpack_transform_test.cc.o -o bin/conv_to_nnpack_transform_test  -Wl,-rpath,/home/me/pytorch/build/lib:  lib/libgtest_main.a  -Wl,--no-as-needed,"/home/me/pytorch/build/lib/libtorch.so" -Wl,--as-needed  -Wl,--no-as-needed,"/home/me/pytorch/build/lib/libtorch_cpu.so" -Wl,--as-needed  lib/libprotobuf.a  lib/libc10.so  -lmkl_intel_lp64  -lmkl_gnu_thread  -lmkl_core  -fopenmp  /usr/lib/x86_64-linux-gnu/libpthread.so  -lm  /usr/lib/x86_64-linux-gnu/libdl.so  lib/libdnnl.a  -ldl  lib/libgtest.a  -pthread && :
/usr/bin/ld: /home/me/pytorch/build/lib/libtorch_cpu.so: undefined reference to `at::native::my_trace_backward(at::Tensor const&, c10::ArrayRef<long>)'
collect2: error: ld returned 1 exit status
[1291/1581] Building CXX object test_tensorexpr/CMakeFiles/test_tensorexpr.dir/test_simplify.cpp.o
ninja: build stopped: subcommand failed.

The error message speaks of libtorch, but not the README.md. What did I miss?

What does the signature of your my_trace_backward function look like?

Tensor my_trace_backward(const Tensor& grad, IntArrayRef sizes)

It’s just a copy of trace_backward.