How does pytorch use c++?

Nick_Halden · October 10, 2020, 11:24am

Hey there,

I started to use pytorch two days ago and I was trying to figure out how it all works.

I was trying to see where the implementation of ReLU is, but, after a certain point of backtracking in python code, I did not find the implementation.

Then I check on the github and found the implementation, but where exactly is that binary on linux? Couldn’t find it. And how/where is the python code calling it?

Thank you.

ranman · October 11, 2020, 5:14am

Slightly out of date but still quite useful blog post: http://blog.ezyang.com/2019/05/pytorch-internals/

I’m still learning myself but my understanding is that PyTorch uses a dispatcher mechanism to select an appropriate c++ implementation of various operators. This is covered in detail in another great blog post from Ed: http://blog.ezyang.com/2020/09/lets-talk-about-the-pytorch-dispatcher/

Since PyTorch operators work on various hardware platforms, tensor types, and data types - the underlying operator implementation can be optimized in C++ by humans, codegen, and compilers.

For relu in particular it looks like the c++ version just calls threshold:

github.com

pytorch/pytorch/blob/master/aten/src/ATen/native/Activation.cpp#L368


    Scalar threshold,
    const Tensor& output) {
  Tensor grad_input;
  auto iter = TensorIterator::binary_op(grad_input, grad_output, output);
  softplus_backward_stub(iter.device_type(), iter, beta, threshold);
  return iter.output();
}

// computes `result = self <= threshold ? value : other`
// other is `self` in threshold() and `grad` in threshold_backward()
static Tensor threshold_out(
    optional<Tensor> opt_result,
    const Tensor& self,
    Scalar threshold,
    Scalar value,
    const Tensor& other) {
  Tensor result = opt_result.value_or(Tensor());
  auto iter = TensorIteratorConfig()
    .set_check_mem_overlap(false)  // threshold is idempotent, so overlap is okay
    .add_output(result)
    .add_input(self)

Here’s the CPU implementation for threshold:

github.com

pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Activation.cpp#L86


      },
      [=](Vec a, Vec b, Vec c) -> Vec {
        auto mask = a < zero_vec;
        auto max_deriv_vec = Vec::blendv(zero_vec, one_vec.neg(), mask);
        auto sign_vec = Vec::blendv(one_vec.neg(), one_vec, mask);
        return (max_deriv_vec + sign_vec * ((b - one_vec) / b)).neg() * c;
      });
  });
}

static void threshold_kernel(
    TensorIterator& iter,
    Scalar threshold_scalar,
    Scalar value_scalar) {
  AT_DISPATCH_ALL_TYPES(iter.dtype(), "threshold_cpu", [&] {
    using Vec = Vec256<scalar_t>;
    scalar_t threshold = threshold_scalar.to<scalar_t>();
    Vec threshold_v = Vec(threshold);
    scalar_t value = value_scalar.to<scalar_t>();
    Vec value_v = Vec(value);
    cpu_kernel_vec(

Hope that helps!