I have seen that it is possible to have a different learning rates for a layer or group of parameters using the code copied below.
optim.SGD([
{'params': mylayer.weight},
{'params': mylayer.bias, 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
Is it possible to have a different learning rate per parameter e.g.
optim.SGD([
{'params': mylayer.weight, 'lr': [np.random.random() for i in range(len(mylayer.weight)]},
{'params': mylayer.bias, 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
?