I’m trying to apply gradient compression with mixed-precision training (amp, mode = “O2” with dynamic loss scale), however I encounter with an error in the middle of training:
File "/home/anaconda3/lib/python3.7/site-packages/ray/tune/function_runner.py", line 248, in run
self._entrypoint()
File "/home/anaconda3/lib/python3.7/site-packages/ray/tune/function_runner.py", line 316, in entrypoint
self._status_reporter.get_checkpoint())
File "/home/anaconda3/lib/python3.7/site-packages/ray/tune/function_runner.py", line 575, in _trainable_func
output = fn()
File "/home/resnetTraceDDP.py", line 634, in train
scaled_loss.backward() # calculate the gradients
File "/home/anaconda3/lib/python3.7/contextlib.py", line 119, in __exit__
next(self.gen)
File "/home/anaconda3/lib/python3.7/site-packages/apex/amp/handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "/home/anaconda3/lib/python3.7/site-packages/apex/amp/_process_optimizer.py", line 190, in post_backward_with_master_weights
models_are_masters=False)
File "/home/anaconda3/lib/python3.7/site-packages/apex/amp/scaler.py", line 119, in unscale
self.unscale_python(model_grads, master_grads, scale)
File "/home/anaconda3/lib/python3.7/site-packages/apex/amp/scaler.py", line 88, in unscale_python
1./scale,
ZeroDivisionError: float division by zero
I’m applying this to ResNet model (ResNet 50, for example), and all the hyperparameter values are reasonable. I’m not entirely sure how to fix this, will changing to static loss scale works? I was also wondering what’s the difference between dynamic loss scaling and static loss scaling, and whether that would lead to difference in run time or accuracy