Why bf16 do not need loss scaling?

bfloat16 can go all the way down to ~10e-38 whereas float16s smallest value is ~6e-8. Does it make sense why that might be beneficial when many model parameters can often be below that threshold?

1 Like