Proper way to do gradient clipping?

I have tested in CPU and got no better results than just few milliseconds. (for someone who may try to implement LSTM for benchmarking :slight_smile: ) I think some more addition is insignificant than another expensive computations, like multiplication of weight matrices, nonlinear activation functions, or even python loop itself.

1 Like