How does the * operator dispatch to the special op on cuda in backend code?

When I was learning about autograd mechanism, I saw the following code.

template <typename T>
Tensor mul_tensor_backward(Tensor grad, T other, ScalarType self_st) {
  auto out = grad * other.conj();
  return handle_r_to_c(self_st, std::move(out));
}

I know this is the derivative function of multiplication. But the code

auto out = grad * other.conj();

confuses me. If all tensors are cuda tensor, this code should be executed on gpu. however, i dont find the overloads of operate *, so how do the pytorch dispatch it to the mul operator in cuda?