Gradient value is nan

Perhaps this is due to exploding gradients? I’d recommend you to first try gradient clipping and see how the training goes.

1 Like