Where are `at::_ops::***::call` implementations?

Hi, I’m trying to understand where / how to find the CUDA implementations for various operators. Specifically, the _pad_enum op right now. In my code I call:

auto options =
    F::PadFuncOptions({/*left*/ 0, /*right*/ 0, /*top*/ 0, /*bottom*/ pad_prob_m})
F::pad(tensor, options);

which calls at::_pad_enum(). In my version of libtorch (libtorch-cxx11-abi-shared-with-deps-1.13.1+cu117.zip), that implementation is in libtorch/include/ATen/ops/_pad_enum.h and calls at::_ops::_pad_enum::call(self, pad, mode, value). If I look at at::_ops::_pad_enum_ops.h I see this struct:

struct TORCH_API _pad_enum {
  using schema = at::Tensor (const at::Tensor &, at::IntArrayRef, int64_t, c10::optional<double>);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "_pad_enum(Tensor self, int[] pad, int mode, float? value=None) -> Tensor")
  static at::Tensor call(const at::Tensor & self, at::IntArrayRef pad, int64_t mode, c10::optional<double> value);
  static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, const at::Tensor & self, at::IntArrayRef pad, int64_t mode, c10::optional<double> value);

So my questions are:

  1. Where is the underlying implementation here so I can look at the CUDA code?
  2. If I set a breakpoint there with cuda-gdb, and build from source with Debug build type, will it build with nvcc -g -G and will I be able to hit my breakpoint? (I assume I can’t step into from, e.g. torch::nn::functional because of this dispatch mechanism)
  3. Is there any documentation describing how things like these include files are generated and how these calls are dispatched so I can understand this all better and be able to find any CUDA code I want?


Ah, I believe that’s a consequence of there being some codegen that happens between ATen ops and their underlying implementations, which could create some not-intended-for-human-consumption source files like _pad_enum.h to reduce code duplication at the expense of readability. I checked the git blame for that file which reveals this PR to be the last one to do major changes:

From there you can infer the underlying names are probably things like replicationpad and reflectionpad, which are probably implemented in the ATen/native/cuda/ directory e.g., for GPU:

In general understanding the dispatching mechanism entirely is a bit daunting (for me), so I typically just check git blame which would hopefully reveal when corresponding files have moved around in PRs or e.g., use a profiler like nsys nvprof to reveal the names of the kernels being called.