Gradient value clipping not working

Hello everyone! I am trying to train a simple 2 layer MLP on tabular input for a reinforcement learning task with ReLU activations in between and no activation in the end (network predicts log scores). Because of ReLU activations, the training becomes unstable quickly and loss shoots up very high during training. I have added gradient clipping to stabilize the training, but it’s not working always. I have added the clipping as follows:

torch.nn.utils.clip_grad_value_(doc_scorer.parameters(), 0.9)
` optimizer.step()

However, the gradient values are not getting clipped, as can be seen from the wandb log attached

I have read that wandb logs gradient before applying the clipping operations, but if gradient clipping were to function properly, the parameters of the network should not increase in magnitude over time


Can anyone help me with any suggestions or pointers to solve this problem? Thanks.

Could you explain why this should be the case?
Also, did you check the actual gradient values after applying the clipping operation?

Thank you for your reply. I checked the gradient values from the gradient tensor, and it is clipping. I was suspicious because the parameter values increased over iterations. I realized it happened because all gradient values were clipped to max-min and changing the parameter over time.