@tom First of all, thank you for your answer. To clarify my statement: I do not want to mix different kernels but just mix the dtypes of the arguments: An example
If all the inputs are of long type, the following would work:
AT_DISPATCH_INTEGRAL_TYPES(at::ScalarType::Long,"Some long error message", [&] {
some_cuda_kernel<scalar_t><<<blocks, threads>>>(
some_long_tensor.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
....
);
}
);
where I have used the packed_accesor.
If I only have float tensors, the following works
AT_DISPATCH_FLOATING_TYPES(at::ScalarType::Float,"Some floating error message", [&] {
some_cuda_kernel<scalar_t><<<blocks, threads>>>(
some_float_tensor.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
....
);
}
);
How do I dispatch a kernel that takes both some_long_tensor and some_float_tensor?
Attempts with AT_DISPATCH_ALL_TYPES
AT_DISPATCH_ALL_TYPES(at::ScalarType,"Hello !",[&] {
some_cuda_kernel<scalar_t><<<blocks, threads>>>(
some_float_tensor.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
some_long_tensor.packed_accessor<scalar_t,2,torch::RestrictPtrTraits,size_t>(),
....
);
})
or AT_DISPATCH_INTEGRAL_TYPES_AND
AT_DISPATCH_INTEGRAL_TYPES_AND(at::ScalarType::Float,
at::ScalarType::Long,
"some combined error message", [&] {
some_cuda_kernel<scalar_t><<<blocks, threads>>>(...)
}
);
all failed.
For any raw CUDA kernel, this is not a problem. So if there is any other way to use a raw kernel in this situation, I would appreciate a hint.