Hi all,
I ran into a build issue when compiling the ROI Pool CUDA kernel on Windows + MSVC + CUDA with T=half. The kernel fails to compile due to an ambiguous comparison operator.
Environment
-
OS: Windows 10/11
-
Compiler: MSVC 19.x (Visual Studio 2022)
-
CUDA: 12.x
-
PyTorch / torchvision: Libtorch(2.3.1) + torchvision csrc
Problematic code
In torchvision/csrc/ops/cuda/roi_pool_kernel.cu, inside roi_pool_forward_kernel_impl:
if (offset_input[input_index] >maxval) {
maxval = offset_input[input_index];
maxidx = input_index;
}
Likely cause
- For
__half/c10::Half, the availability of operator overloads depends on macros such as__CUDA_NO_HALF_OPERATORS__and__CUDA_NO_HALF_CONVERSIONS__. - On MSVC+NVCC, the implicit promotion from half → float appears to be treated more strictly, resulting in ambiguity.
- On GCC/Clang this may not trigger, which is why others may not have reported it.
I resolved this problem with static_cast<float>, but it’s not a good enough solution for situations like this.
Better solution
Following PyTorch conventions, we could use an accumulation type to ensure consistent promotion:
using acc_t = at::acc_type<T, /*is_cuda=*/true>;
acc_t v = static_cast<acc_t>(offset_input[input_index]);
acc_t mv = static_cast<acc_t>(maxval);
if (v > mv) {
maxval = offset_input[input_index];
maxidx = input_index;
}
And also initialize maxval in a type-safe way:
T maxval = is_empty ? T(0) : std::numeric_limits<T>::lowest();
Suggestion
Would it make sense to update the ROI Pool kernel to use acc_type (and lowest()) to avoid this build-time ambiguity and preserve precision across float/double/half?
I can open a PR if maintainers think this is the right direction.
Please note that I’m still a junior-level software engineer, and this is my first time posting on the forum. I apologize in advance if my message is not fully aligned with the usual standards, and I appreciate your understanding.