I think the canonical reference for finding bad gradients is this snippet by Adam Paszke:
It checks for NaN (by using x!=x
if and only if x is NaN) and very large gradients, but you could easily adapt is_bad_grad
to best fit your purpose.
Best regards
Thomas