CNN fp16 slower than fp32 on Tesla P100

smth · January 11, 2018, 1:01pm

on P100 we dont expect FP16 to be any faster, because we disabled FP16 math on P100 (it is numerically unstable). We use simulated FP16, where storage is FP16, but compute is in FP32 (so it upconverts to FP32 before doing operations).