Cpp extension example for mixed precision, torch.compile

Is there any example for cpp extension to support for mixed precision training and to support torch.compile? im not sure how to make my cuda kernel to support it.

i Think its better to use ATen DISPATCH method so your cuda kernel or c++ function adapt on any floating point type specially 16 float point number and this is a call example

AT_DISPATCH_FLOATING_TYPES(feats.type(), "trilinear_fw_cu", 
    ([&] {
        trilinear_fw_kernel<scalar_t><<<blocks, threads>>>(
            feats.packed_accessor<scalar_t, 3, torch::RestrictPtrTraits, size_t>(),
            points.packed_accessor<scalar_t, 2, torch::RestrictPtrTraits, size_t>(),
            feat_interp.packed_accessor<scalar_t, 2, torch::RestrictPtrTraits, size_t>()
        );
    }));