Proper way to do gradient clipping?

kim.seonghyeon · January 25, 2017, 11:34pm

I have tested in CPU and got no better results than just few milliseconds. (for someone who may try to implement LSTM for benchmarking ) I think some more addition is insignificant than another expensive computations, like multiplication of weight matrices, nonlinear activation functions, or even python loop itself.