Why bf16 do not need loss scaling?

J_Johnson · April 4, 2023, 1:16pm

bfloat16 can go all the way down to ~10e-38 whereas float16s smallest value is ~6e-8. Does it make sense why that might be beneficial when many model parameters can often be below that threshold?