Half vs Full Precision with CUDA

Also answered here.