How can I write the CUDA code to support FP16 calculation?

cb_zhang · December 26, 2020, 5:16am

I want to write a custom layer using CUDA. However, It fails when I use NVIDIA-Apex to train the model with mixed-precision. What should I do? Is there any example of FP16 cuda layer?

ptrblck · January 6, 2021, 7:35am

You could use AT_DISPATCH_FLOATING_TYPES_AND_HALF to dispatch the code for the float16 type and use scalar_t in the code (similar to e.g. this code).
Also note, that we recommend to use the native mixed-precision training utility via torch.cuda.amp instead of apex/amp now.

fangwei123456 · January 16, 2021, 8:58am

github.com

fangwei123456/spikingjelly/blob/master/spikingjelly/cext/csrc/neuron/neuron_forward_kernel.cu

#include <cuda.h>
#include <cuda_runtime.h>
#include <cuda_fp16.h>
#include <torch/extension.h>
#include <math.h>
#include <stdio.h>
#include "neuron_def.h"
__forceinline__  __device__ float grad_atan(const float & alpha, const float & x)
{
  const float M_PI_2__alpha__x = (float) M_PI_2 * alpha * x;
  return alpha / 2.0f / (1.0f + M_PI_2__alpha__x * M_PI_2__alpha__x);
}

__forceinline__  __device__ float grad_sigmoid(const float & alpha, const float & x)
{
  const float sigmoid_ax = 1.0f / (1.0f + expf(- alpha * x));
  return (1.0f - sigmoid_ax) * sigmoid_ax * alpha;
}

typedef float (*grad_surrogate_function) (const float &, const float &);

This file has been truncated. show original

These codes may help.