In the context of mixed precision training, I’ve found the following:
Using mixed precision training requires three steps:
1/Converting the model to use the float16 data type where possible.
2/Keeping float32 master weights to accumulate per-iteration weight updates.
3/Using loss scaling to preserve small gradient values.
What does master weights mean exactly?
Thanks!