When I was learning about autograd mechanism, I saw the following code.
template <typename T>
Tensor mul_tensor_backward(Tensor grad, T other, ScalarType self_st) {
auto out = grad * other.conj();
return handle_r_to_c(self_st, std::move(out));
}
I know this is the derivative function of multiplication. But the code
auto out = grad * other.conj();
confuses me. If all tensors are cuda tensor, this code should be executed on gpu. however, i dont find the overloads of operate *, so how do the pytorch dispatch it to the mul operator in cuda?