Hello,
I Wanted to try a few training recipes in fixed precision
- looking for documentation on how different modules in a model should be converted correctly
my current setup just usesmodel.half()ormodel.to(torch.float16)it hasnot thrown any blocking errors, but i would like to setup my code correctly for future use - loss scaling
- current LRs for different highlevel modules are set to a known and stabilised config that works in single precision and mixed precision with
float16/bfloat16- [edit] ofc i dont expect to keep these as final lrs in half precision, only using them as a starting point
- when running half precision step 2 features outputs jump to
nanfrom early layers - looking to use a scaler for the lossfunction at this stage but the
amp.GradScalerreturns -
File ".../trainers.py", line 92, in run_step self.scaler.unscale_(self.optimizer) File ".../python3.12/site-packages/torch/amp/grad_scaler.py", line 342, in unscale_ optimizer_state["found_inf_per_device"] = self._unscale_grads_( ^^^^^^^^^^^^^^^^^^^^^ File ".../python3.12/site-packages/torch/amp/grad_scaler.py", line 264, in _unscale_grads_ raise ValueError("Attempting to unscale FP16 gradients.") ValueError: Attempting to unscale FP16 gradients. -
#scaling code self.scaler.scale(loss).backward() self.scaler.unscale_(self.optimizer) if self.cfg.clip_grad is not None: torch.nn.utils.clip_grad_norm_( self.model.parameters(), self.cfg.clip_grad ) self.scaler.step(self.optimizer) # When enable amp, optimizer.step call are skipped if the loss scaling factor is too large. # Fix torch warning scheduler step before optimizer step. scaler = self.scaler.get_scale() self.scaler.update() if scaler <= self.scaler.get_scale(): self.scheduler.step() - What would be the correct way to implement loss scaling in such a configuration with any pytorch based modules?
- current LRs for different highlevel modules are set to a known and stabilised config that works in single precision and mixed precision with