Why is there such a huge performance gap between bfloat16, float16, and float32?

when was float16:token/s was :0.279

when was bfloat16: token/s was :5.283

when was float32: token/s was :2.64