Gradient Clipping is not work as expected

XezXey · May 5, 2020, 6:21pm

Hi all, I have an exploding gradient problem when train the minibatch for 150-200 epochs with batch size = 256 and there’s about 30-60 minibatch (This depends on my specific config). But I have an exploding gradient issues even if I add the code below.

As you can see this below images, notice that in step about 40k there’s the swing of gradients between ± 20k, 40k and 60k respectively. I don’t know why this happens because i use the clip_grad_value_ above. Also Using the learning rate decay from 0.01 to about 0.008 at step 40k.

Or do I need to update the weight parameters by myself something like this

But i think optimizer.step() should do the job and the clip_grad_value_ is an inplace operation so i don’t need to take the return value from function. Please correct if i did anything wrong. Thank you very much

Best regards,
Mint

TinfoilHat0 · May 5, 2020, 10:52pm

Hi it seems correct to me however, perhaps your norm bound is too high ? What happens when you lower it from 100 to 10 or so ?

XezXey · May 6, 2020, 4:41am

Hi, I’ve decrease it to ±1 and it’s still have these problem. But there’s 2 different kind of computing the bound by norm or by a value. Maybe using value is a cause of the problem. I’m not sure about this

XezXey · May 12, 2020, 1:36pm

any solution? I still got this problem
Thank you

TinfoilHat0 · May 16, 2020, 7:38pm

It seems correct to me. Can you, for example, try printing the L2 norm of gradients over iterations to see if the code is really working as expected?