The optimized parameters use different optimizer and learning rate. They are quite different groups so that I want to clip them separately suing clip_grad_norm_
. I made the parameter groups into lists and passed into the clip_grad_norm_
, like setting different learning rate for groups. But this seems not work for the gradient clipping.
The document says the parameter needs to be an iterable of Tensors or a single Tensor that will have gradients normalized
. What if I have a list to do it?