I suspect that during my training procedure, a small number of parameters in my model undergo drastic changes from batch to batch in certain training epochs (exploding gradients). In other words it does not happen all the time. Other than simply printing out the gradients at those epochs/batches (which might be unfeasible for large models), is there a simple way to do this? Like, getting a warning if at one point a gradient becomes “larger than usual”?
If you’re using something like adam, gradient variance is already tracked, so I’d consider using that information for tracking. For crude fixing, nn.utils has some clipping functions. And IIRC tensorboard can plot gradient densities over epochs (for tracked params/tensors).
I am using SGD. Is the variance tracked there as well?
And isn’t Tensorboard part of Tensorflow? Little confused.
No, it is done by adaptive optimizers. Actually, switching to one may solve your problem by itself, as they smooth steps strongly. RMSProp may be the least intrusive one.
Yes, but pytorch supports writing its trace files. I re-checked my older code that produced gradient plots, and it is a bit non-trivial - I used TensorboardLogger and events from ignite library, and Tensor.register_hook callbacks to record gradients for exporting.