Is it possible to set different learning rates for different part of a tensor?
So, the only way to have a different learning rates is to manually scale the gradients on the backward pass. Right?
No, you could also use different (sub-)tensors and stack/concatenate them in the actual forward pass as explained e.g. here.
1 Like