Where are `at::_ops::***::call` implementations?

collinmccarthy · February 19, 2023, 7:30pm

Hi, I’m trying to understand where / how to find the CUDA implementations for various operators. Specifically, the _pad_enum op right now. In my code I call:

auto options =
    F::PadFuncOptions({/*left*/ 0, /*right*/ 0, /*top*/ 0, /*bottom*/ pad_prob_m})
        .mode(torch::kConstant)
        .value(0);
F::pad(tensor, options);

which calls at::_pad_enum(). In my version of libtorch (libtorch-cxx11-abi-shared-with-deps-1.13.1+cu117.zip), that implementation is in libtorch/include/ATen/ops/_pad_enum.h and calls at::_ops::_pad_enum::call(self, pad, mode, value). If I look at at::_ops::_pad_enum_ops.h I see this struct:

struct TORCH_API _pad_enum {
  using schema = at::Tensor (const at::Tensor &, at::IntArrayRef, int64_t, c10::optional<double>);
  using ptr_schema = schema*;
  // See Note [static constexpr char* members for windows NVCC]
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(name, "aten::_pad_enum")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(overload_name, "")
  STATIC_CONSTEXPR_STR_INL_EXCEPT_WIN_CUDA(schema_str, "_pad_enum(Tensor self, int[] pad, int mode, float? value=None) -> Tensor")
  static at::Tensor call(const at::Tensor & self, at::IntArrayRef pad, int64_t mode, c10::optional<double> value);
  static at::Tensor redispatch(c10::DispatchKeySet dispatchKeySet, const at::Tensor & self, at::IntArrayRef pad, int64_t mode, c10::optional<double> value);
};

So my questions are:

Where is the underlying implementation here so I can look at the CUDA code?
If I set a breakpoint there with cuda-gdb, and build from source with Debug build type, will it build with nvcc -g -G and will I be able to hit my breakpoint? (I assume I can’t step into from, e.g. torch::nn::functional because of this dispatch mechanism)
Is there any documentation describing how things like these include files are generated and how these calls are dispatched so I can understand this all better and be able to find any CUDA code I want?

Thanks,
-Collin

eqy · February 20, 2023, 2:57am

Ah, I believe that’s a consequence of there being some codegen that happens between ATen ops and their underlying implementations, which could create some not-intended-for-human-consumption source files like _pad_enum.h to reduce code duplication at the expense of readability. I checked the git blame for that file which reveals this PR to be the last one to do major changes:

From there you can infer the underlying names are probably things like replicationpad and reflectionpad, which are probably implemented in the ATen/native/cuda/ directory e.g., for GPU:

github.com

pytorch/pytorch/blob/8f1c3c68d3aba5c8898bfb3144988aab6776d549/aten/src/ATen/native/cuda/ReflectionPad.cu

#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
#include <ATen/core/Tensor.h>
#include <ATen/ceil_div.h>
#include <ATen/Dispatch.h>
#include <ATen/cuda/Atomic.cuh>
#include <ATen/cuda/detail/IndexUtils.cuh>
#include <ATen/cuda/CUDAContext.h>
#include <ATen/TensorUtils.h>
#include <ATen/Utils.h>

#ifndef AT_PER_OPERATOR_HEADERS
#include <ATen/Functions.h>
#include <ATen/NativeFunctions.h>
#else
#include <ATen/ops/empty.h>
#include <ATen/ops/zeros_like.h>
#include <ATen/ops/reflection_pad1d_native.h>
#include <ATen/ops/reflection_pad2d_native.h>
#include <ATen/ops/reflection_pad3d_native.h>
#include <ATen/ops/reflection_pad1d_backward_native.h>

This file has been truncated. show original

github.com

pytorch/pytorch/blob/8f1c3c68d3aba5c8898bfb3144988aab6776d549/aten/src/ATen/native/cuda/ReplicationPadding.cu

#define TORCH_ASSERT_ONLY_METHOD_OPERATORS
#include <ATen/core/Tensor.h>
#include <ATen/ceil_div.h>
#include <ATen/Dispatch.h>
#include <ATen/cuda/Atomic.cuh>
#include <ATen/cuda/detail/IndexUtils.cuh>
#include <ATen/cuda/CUDAContext.h>
#include <ATen/TensorUtils.h>
#include <ATen/Utils.h>
#include <c10/util/Exception.h>

#ifndef AT_PER_OPERATOR_HEADERS
#include <ATen/Functions.h>
#include <ATen/NativeFunctions.h>
#else
#include <ATen/ops/empty_like.h>
#include <ATen/ops/replication_pad1d_native.h>
#include <ATen/ops/replication_pad1d_backward_native.h>
#include <ATen/ops/replication_pad2d_native.h>
#include <ATen/ops/replication_pad2d_backward_native.h>

This file has been truncated. show original

In general understanding the dispatching mechanism entirely is a bit daunting (for me), so I typically just check git blame which would hopefully reveal when corresponding files have moved around in PRs or e.g., use a profiler like nsys nvprof to reveal the names of the kernels being called.

Realtyxxx · August 8, 2024, 8:59am

Hi，I’m also want to see it’s dispatcher mechanism;

I wonder this is generated by gen.py ??