Is it possible to set different learning rates for different part of a tensor?
That’s not possible as you would be hitting the same limitations explained in your other topic.
So, the only way to have a different learning rates is to manually scale the gradients on the backward pass. Right?
No, you could also use different (sub-)tensors and stack/concatenate them in the actual forward pass as explained e.g. here.