[Solved] Maskrcnn_benchmark + sm_90 (or Cuda > 11.8 or torch > 1.10 )

So I have working maskrcnn_benchmark working with torch==1.10.0+cu111.

Now I shift to NVIDIA H100 which are sm_90.

sm_90 requires CUDA 11.8 which is supported on torch 1.13.0 onwards.
torch 1.13.0 (or any other torch >1.11) have removed <THC/THC.h> which mean I have to patch the
maskrcnn_benchmark/csrc/cuda/ with <ATen/ATen.h> following steps here : Missing headers in ATen/cuda/DeviceUtils.cuh · Issue #72807 · pytorch/pytorch · GitHub

But once I do all of this I get

maskrcnn_benchmark/csrc/cuda/deform_pool_kernel_cuda.cu(245): error: no instance of overloaded function "atomicAdd" matches
 the argument list                                                                                                                                                                   
            argument types are: (c10::Half *, c10::Half)                                                                                                                             
          detected during instantiation of "void DeformablePSROIPoolBackwardAccKernel(int, const scalar_t *, const scalar_t *, int, scalar_t, int, int, int, int, int, int, scalar_t 
*, scalar_t *, const scalar_t *, const scalar_t *, const scalar_t *, int, scalar_t, int, int, int, int, int) [with scalar_t=c10::Half]"                                              
(347): here                                                

What do I do now :confused:

[UPDATE] After some chatgpt


#include <cuda_fp16.h>
#include <c10/util/Half.h>

template <typename scalar_t>
__device__ void atomicAddHalf(scalar_t* address, scalar_t val) {
    if constexpr (std::is_same<scalar_t, double>::value) {
        atomicAdd(reinterpret_cast<__half*>(address), __double2half(val));
    } else if constexpr (std::is_same<scalar_t, float>::value) {
        atomicAdd(reinterpret_cast<__half*>(address), __float2half(val));
    } else {
        atomicAdd(reinterpret_cast<__half*>(address), static_cast<__half>(val));
    }
}

replace atomicAdd with atomicAddHalf

Also in setup.py

extra_compile_args["nvcc"] = [
            '-arch=sm_90',
            "-DCUDA_HAS_FP16=1",
            "-D__CUDA_NO_HALF_OPERATORS__",
            "-D__CUDA_NO_HALF_CONVERSIONS__",
            "-D__CUDA_NO_HALF2_OPERATORS__",
        ]

The build is successful. Now the code is compatible with even torch 2.4 as well which doesn’t require explicit '-arch=sm_90' argument